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Introduction 


* 


This  research  program  investigates  the  relationship  between  an  ATM  mutation  or  polymorphism 
(ATM  heterozygosity)  and  susceptibility  to  radiation-induced  breast  cancer  and  normal  tissue  injury 
following  breast  cancer  radiation  treatment.  Over  the  4-year  period,  we  plan  to  study  these  potentially 
important  relationships  by  sequencing  ATM  cDNA  from  200  breast  cancer  patients  and  50  breast  cancer 
patients  with  a  significant  radiation  injury.  Our  interest  in  studying  ATM  in  breast  cancer  and  radiation 
sensitivity  first  arose  after  obligate  ATM  heterozygotes  (family  members  of  children  with  ataxia 
telangiectasia)  were  noted  to  have  a  5-fold  greater  risk  of  developing  breast  cancer  than  the  general 
population  (1).  This  led  to  estimates  that  8%  of  all  breast  cancers  occur  in  ATM  heterozygotes  (2).  In 
addition,  fibroblasts  from  obligate  ATM  heterzyogotes  were  noted  to  be  radiosensitive  compared  to  controls 
(3).  Following  cloning  of  the  ATM  gene,  our  preliminary  work  with  cDNA  sequencing  and  data  from  other 
investigators  suggest  that  the  protein-truncating  mutations  commonly  found  in  patients  in  the  disease  with 
ataxia  telangiectasia  are  uncommon  in  breast  cancer  patients.  However,  our  data  and  that  of  others  have 
identified  a  series  of  missense  mutations  that  cause  a  single  amino  acid  change  in  the  protein  product  of  the 
ATM  gene.  This  has  led  some  authors  to  propose  that  the  changes  in  the  ATM  gene  may  be  different  for 
breast  cancer  development  and  the  disease  of  ataxia  telangiectasia.  Specifically,  they  suggest  that  it  is  single 
nucleotide  changes  in  the  gene  that  may  have  an  association  with  breast  cancer  development  (4).  However,  it 
is  yet  unclear  whether  these  mutations  result  in  functional  consequences  that  may  predispose  to  breast  cancer 
or  whether  they  represent  inconsequential  polymorphisms.  In  the  upcoming  years  of  this  program  we  will  be 
testing  whether  these  polymorphisms  are  more  frequently  found  in  breast  cancer  patients  compared  to 
controls.  We  have  developed  allele-specific  oligonucleotide  assays  to  compare  the  frequency  of  specific 
single  base  nucleotide  changes  seen  in  our  sequenced  breast  cancer  population  to  the  frequency  in  ethnically- 
matched  controls  drawn  from  a  group  of  960  individuals  who  donated  blood  for  this  project  during 
community-based  blood  drives.  In  addition,  we  plan  on  developing  in  vitro  assays  to  investigate  whether 
missense  mutations/polymorphisms  we  have  identified  result  in  functional  consequence.  The  phenotypes  we 
plan  to  evaluate  are  cellular  radiosensitivity  and  capacity  for  DNA  damage  repair. 
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Body 


The  first  specific  aim  in  our  statement  of  work  was  to  establish  the  incidence  of  ATM  heterozygosity 
in  200  breast  cancer  patients.  This  aim  was  to  be  completed  over  the  first  3  years  of  the  funding  period.  To 
achieve  this  aim,  we  had  developed  a  cDNA  sequencing  process  for  the  ATM  gene  prior  to  the  time  of 
awarding  of  this  grant.  The  sequencing  process  was  verified  by  confirming  the  presence  of  a  mutation  in  two 
obligate  ATM  heterozygotes  (parents  of  an  individual  with  the  disease  of  ataxia  telangiectasia).  During  the 
first  year  of  the  funding  period,  task  1  of  specific  aim  1  was  to  begin  sequencing  the  cDNA  of  breast  cancer 
patients.  Over  this  first  year,  we  enrolled  101  breast  cancer  patients  onto  our  institutional  protocols  studying 
the  ATM  gene.  As  specified  in  this  task,  the  coding  of  all  clinical  and  epidemiological  information  was 
abstracted  and  recorded  for  all  of  these  patients.  From  these  samples,  RNA  was  isolated  and  reverse 
transcribed  into  cDNA  and  then  sequenced  for  mutations  in  ATM.  Sufficient  information  from  sequencing 
of  ATM  cDNA  was  obtained  in  89  subjects. 

Our  first  specific  aim  focused  on  studying  a  nonselected  breast  cancer  patient  population.  Our  second 
specific  aim  focused  on  a  second  set  of  breast  cancer  patients  selected  because  they  had  experienced  a 
significant  normal  tissue  radiation  injury.  Of  the  89  subjects  sequenced  thus  far,  68  breast  cancer  patients 
were  in  the  nonselected  population.  No  protein  truncating  mutations  were  found  in  this  cohort.  A  total  of  30 
patients  had  variation  of  the  sequenced  gene  compared  to  the  GeneBank  sequence.  Specifically,  a  total  of  41 
single  base  changes  were  detected  in  the  30  patients.  Nine  patients  had  2  or  more  single  nucleotide  changes. 
From  these  patients,  we  identified  4  missense  mutations/polymorphisms  that  occurred  repeatedly  in  3  or 
more  of  the  89  patients.  The  specific  polymorphisms  and  their  frequency  are  listed  below: 


1. 


Below  is  a  portion  of  the  ATM  cDNA  sequence  with  a  C  to  G  single  nucleotide  change: 


1  tttttagtag  agacagggtt 
61  atcctcccac  cttggcctcc 
121  gtaaattata  cttttatttt 
181  tacagcatta  cttgtataga 
241  aaattattgt  gcctttgacc 
301  tactgcagta  taaaataatt 
361  atatatacat  atacatatat 
421  ctgaaattgt  gaaccatgag 
481  gaacatgata  gagctacaga 
541  aaatgtgtga  ttagtaaccc 
601  atttaagcgc  ctgattcgag 
661  caaacaagga  aaatatttga 
721  tactgtcttt  atttttctct 
781  gtcttaacat  ttatctttgc 
841  taatatttat  ccaaaacata 
901  attatttctt  ccaataaaat 


tcaccatgtt 

caaagtgctg 

aatcctgcta 

ttttaagaaa 

agaatgtgcc 

atatacacat 

atacctatat 

tctagtactt 

acgaaaggta 

attattattt 

atcctgaaac 

attgggatgc 

ttcatattta 

ttcctatata 

atttttaaag 

gtttt 


ggtcaggctg 

ggattacagg 

ctactgcaag 

atctcatttt 

tctaattgta 

tttttcacac 

gtattttttt 

aatgatctgc 

gtaaattact 

tcctttttat 

aattaaacat 

tgtttttagg 

tttctgttgt 

tcattatgcc 

gttgttcata 


gtcgaactcc 

cgtgagccac 

caaggcaaac 

aaatacggaa 

cagttaaatc 

ctctttctct 

tacagacagt 

ttatctgctg 

taaattcaat 

tttcagaaag 

ctagatcggc 

tattctattc 

gatattactt 

ttgcatatga 

tagaaactta 


tgaccttgtg 

tgcgtccagt 

atttttgtgt 

atgttaagaa 

taactataaa 

ctatatatgc 

gatgtgtgtt 

ccgtcaacta 

ttttnccttg 

aagttgagaa 

attcagatt (c/g) 

aaatttattt 

ttgtgtgtaa 

atttggcatt 

aaaattataa 


The  resulting  change  in  the  protein  is  -  Ser49Cys 


This  change  was  seen  in  4  of  56  patients  (the  denominator  does  not  total  the  total  patient  number 
because  the  entire  gene  sequence  was  not  obtained  in  every  patient  sample). 


2.  Below  is  a  portion  of  the  ATM  cDNA  sequence  with  a  G-A  single  nucleotide  change: 

1  aacttttgat  acctttttcc  ctcttctatc  atatcagaaa  actccttctg  ttactaacgc 
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61  ttttcaactc  tgaaattgta  tttaaagtgg  tgatataatt  tcatttgtaa  gcaatattct 

121  gtttcattat  ggtaatggcc  tagactggaa  ataaacagtt  acagtgtcac  taacatatat 

181  atttgatatt  gatatactag  cctagtgtgg  ttttttaaac  accacctaat  acatgttttt 

241  tgtttgtttt  tttagcagta  tgttgagttt  atggcagatt  aatctatcat  cttttagaaa 

301  tttaatatgt  caacggggca  tgaaaatttt  aagtaaaatg  tattaatttt  actcattttt 

361  actcaaacta  ttgggtggat  ttgtttgtat  attctaggtg  aaaactgact  tttgtcagac 

421  tgtacttcca  tacttgattc  atgatatttt  actccaa (g/a) at  acaaatgaat  catggagaaa 
481  tctgctttct  acacatgttc  agggattttt  caccagctgt  cttcgacact  tctcgcaaac 

541  gagccgatcc  acaacccctg  caaacttgga  ttcaggtatt  ctattaaatt  tttaacatta 

601  atactgtaaa  ctcagttcta  gagaaagatg  gatttaagat  ggaatcccac  taaaagcact 

661  ttacaggatt  aaatctataa  cctctaaatt  tgtttcttca  tctatggaat  ggagataaaa 

721  gttgccaaca  gttgcaacaa  gtttccaatg  aaataatgtg  tgtaaagtgc  ctaggatagt 

781  acttgatgta  tagtattccc  tcagcacatt  tcggctattg  ataatgggtc  aactaattga 

841  gctttcaata  tgtgtcaggc  actgtgcttg  cactggcaat  attaatgtga  aaaagaaaca 

901  cccacattct  agcaatggaa  aaaacaa 

The  resulting  change  in  the  protein  is  -  Aspl853Asn 
This  change  was  seen  in  15  of  59  patients. 


3.  Below  is  a  portion  of  the  ATM  cDNA  sequence  with  a  T  to  C  single  nucleotide  change: 

1  ctcaagaggg  caaaaggatg  cgtaactgct  tacctatacc  agatgttaaa  ggtttttaaa 

61  cctatgctct  ttacttcctc  tgcttggtga  aaaaagggat  atgtttgcag  acaatgttat 

121  gcttaacatt  tatatctggt  gtttttaaaa  atactttctg  aatttgcctt  tgagattata 

181  acttgtattt  tttctctatc  tattagtaaa  atttgctact  gaataatgac  atttgatata 

241  agtaggtctc  aaagtccgaa  gaagagaagc  atttaaaaga  ataatctatt  aattatataa 

301  gtagtctttg  aatgatgtag  atactaggtt  aatgttttcc  tttgtaatat  attgctaata 

361  catataaggc  aaagcattag  gtacttggtt  tatatattaa  agatcttact  ttcttgaagt 

421  gaacaccacc  aaaaagataa  agaagaactt  tcattctcag  aagtagaaga  actatttctt 

481  cagacaactt  ttgacaagat  ggacttttta  accattgtga  gagaatgtgg  tatagaaaag 

541  caccagtcca  gtattggctt  ctctgtccac  cagaatctca  aggaatcact  ggatcgctgt 

601  cttctgggat  tatcagaaca  gcttctgaat  aattactca (t/c)  ctgaggtgag  attttttaaa 
661  aaaagaacta  agcttatata  tgattcaact  ttggtaaact  gttaggaagg  agaaataggg 

721  gcaggaaaaa  cagcaaggat  ggtgggaggc  ttcattttaa  aagcaaagtg  gcagtaaagg 

781  gctctaaatt  ggacaactta  gcataattaa  aggaaaactc  aagaataata  atttgagtac 

841  ttcctttgta  ctggaaatta  tggtagacat  aaaataattc  cttgtgtagg  ttagtgagga 

901  atagtaagag  tttgagcata  gggattatat  gatgaaaaaa  acctctaaat  acaaaggagg 

961  gaaatgttac  agtaatagaa  aagaacacga  tgtaaacaaa  tctaatagat  tttgg 


The  resulting  change  in  the  protein  is  -  Ser707Pro 
This  change  was  seen  in  4  of  57  patients. 


4. 


Below  is  a  portion  of  the  ATM  cDNA  sequence  with  the  detected  C  to  G  single  nucleotide  change: 


1  ctcccaaagt  gttgggatta  caggtgtgag 
61  aatcttcaat  ataccaaaca  aagaaatctt 
121  agtaagttta  ctgtaatgta  gtttagccat 
181  ggtataaaac  aaataatata  cgctaaaatg 
241  aatttttggc  aaggtgagta  tgttggcata 
301  gaaagacata  ttggaagtaa  cttataataa 


ccagtgcacc  cagcctgaat  gtggttttga 
caatatacca  aacaaagaaa  tttcttttaa 
tgtatggtag  cccccaaaaa  aggacataat 
aattctttta  cactaatttc  ttttagcttg 
ttccacataa  tgacaaataa  gtttagcaca 
cctttcagtg  agttttctga  gtgcttttat 
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361  cagaatgatt 
421  gaaatatata 
481  gagtttttgc 
541  taatgagtaa 
601  ttattcaaaa 
661  tacacaattt 
721  taggtaatgg 
781  ttacatagta 
841  cggtgtgcct 
901  tatttccttc 


atttaacttt 

ttctctgtaa 

atttttttag 

tttttctcta 

tgggccattc 

cttgctgaca 

gtcaaatatt 

acagctcaca 

ggcatttatg 

aatttataaa 


ggaaaactta 

gaatggccct 

taagatctcc 

tttcatattt 

ttaatgtaat 

atcatcacca 

catgaagtat 

gttgcaatat 

tttattctta 

tga 


cttgatttgc 

agtaaattgc 

attgaaaatt 

aaccacagtt 

gggaaaagac 

agttcgcatg 

ttggaatgct 

taaaaatagc 

attcttatac 


aggcatctaa 

cttaaaactt 

ttaaagcagt 

cttttcccgt 

tttcctgtaa 

ttggctgcag 

gcagatggca 

taacacttgt 

ttctgtcact 


caaaggagag 

tgcttgaggt 

ctttgtttgt 

aggctgatc (c/g) 

atgaagtatt 

agtcaatcaa 

gtagaatgtc 

tgagtatata 

tagattctat 


The  resulting  change  in  the  protein  would  be  -  Prol054Arg 


This  change  was  seen  in  1/61  patients. 


Our  second  specific  aim  in  our  statement  of  work  was  to  establish  over  the  first  3  years  of  the  funding 
period  the  incidence  of  ATM  heterozygosity  in  50  breast  cancer  patients  experiencing  a  significant  acute  or 
late  normal  tissue  radiation  injury.  In  the  first  year  of  this  award,  we  have  sequenced  the  cDNA  of  21  breast 
cancer  patients  who  had  radiation  complications.  No  protein  truncating  mutations  were  found  in  this  cohort. 
A  total  of  5  patients  had  sequence  variation  of  ATM  cDNA  compared  to  that  listed  in  GeneBank. 
Specifically,  a  total  of  8  single  base  changes  were  detected  in  the  6  patients.  Two  patients  had  2  single 
nucleotide  changes.  The  frequency  of  these  polymorphisms  /  missense  mutations  were: 

1.  Ser49Cys  (shown  above):  1/16 

2.  Aspl853Asn  (shown  above):  5/16 

3.  Ser707Pro  (shown  above):  0/16 

4.  Prol054Arg  (shown  above):  2/19 

The  third  specific  aim  outlined  in  our  original  statement  of  work  was  to  compare  the  frequency  of 
identified  ATM  polymorphisms  /  mutations  in  the  patient  samples  to  the  frequency  of  these  mutations  in  an 
ethnically  matched  control  set.  Ethnically  matched  controls  were  to  be  obtained  from  a  general  population 
of  individuals  who  did  not  have  a  personal  history  of  cancer.  During  year  1  of  the  award,  our  goal  was  to 
establish  a  control  bank  of  1,000  DNA  samples  from  individuals.  Working  with  our  colleagues  from  the 
Department  of  Epidemiology,  we  have  completed  this  task  and  have  isolated  DNA  from  a  control  sample  of 
960  individuals.  Our  second  task  during  this  period  was  to  develop  ASO  assays  for  rapidly  screening  the 
DNA  bank  for  specific  ATM  polymorphisms.  This  work  is  currently  complete  for  three  of  the  four  missense 
mutations  above  (numbers  1, 2,  and  4).  The  comparative  frequency  testing  for  these  three  polymorphisms 
has  been  performed.  Over  the  remaining  funding  period,  we  plan  to  continue  developing  the  ASO  assay  for 
the  third  polymorphism  above  as  well  as  additional  polymorphisms  that  are  repetitively  found  in  the  patient 
samples.  The  final  task  of  specific  aim  3  is  to  compare  the  frequency  of  ATM  polymorphisms  in  the  patient 
samples  with  case  controls  from  the  stored  DNA  bank.  For  this  task,  we  have  developed  a  new  collaboration 
with  Ranaji  Chakarabarty,  Ph.D.,  Professor  in  the  Human  Genetics  Center  at  the  University  of  Texas 
Houston  School  of  Public  Health.  Doctor  Chakarabarty  has  specific  expertise  in  population  genetics  and  will 
serve  as  an  important  mentor  in  the  data  analysis.  Based  on  this  strategy,  the  appropriate  sample  size  to 
validate  any  unique  polymorphism  can  be  determined  and  a  more  efficient  and  rapid  screening  of  a  large 
breast  cancer  population  using  the  ASO  assay  rather  than  complete  gene  sequencing  can  be  accomplished. 
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In  addition  to  the  specific  tasks  outlined  below,  we  have  developed  additional  collaborations  focused 
on  ATM  research  that  directly  parallels  the  specific  aims  of  this  study.  Specifically,  in  a  collaborative  effort 
with  Ms.  Penelop  Bonnen,  Ph.D.,  and  David  Nelson,  Ph.D.  of  the  Department  of  Molecular  and  Human 
Genetics  of  Baylor  College  of  Medicine,  we  established  haplotypes  in  the  ATM  gene.  This  work  was 
important  in  establishing  the  feasibility  of  using  haplotype  association  studies  to  detect  individuals  with  a 
genetic  predisposition  for  a  disease.  A  manuscript  of  this  work  is  included  in  Appendix  A  and  will  be 
published  in  Am  J  Hum  Gen  (67:, 2000).  The  support  from  the  Career  Development  Award  was 
acknowledged  in  the  manuscript. 

In  addition  to  this  collaboration,  the  first  year  funding  from  the  Career  Development  Award  has 
helped  to  further  development  of  my  career  in  breast  cancer  radiation  oncology.  Some  of  the  insights  gained 
from  our  research  inspired  me  to  further  explore  the  relationship  between  breast  cancer  development  and 
cellular  radiosensitivity.  In  separate  study,  we  demonstrated  that  lymphocytes  from  patients  with  bilateral 
breast  cancer  had  a  greater  number  of  radiation-induced  chromatid  breaks  than  controls.  The  initial  report 
from  this  work  resulted  in  a  peer-reviewed  oral  presentation  at  the  First  International  Conference  on 
Translational  Research  and  Pre-Clinical  Strategies  in  Radio-Oncology,  Lugano,  Switzerland  (March,  2000) 
and  a  manuscript  accepted  for  publication  in  the  Int  J  Radiat  Oncol  Biol  Phys.  The  support  from  my  DOD 
career  development  award  was  also  acknowledged  in  this  manuscript.  This  manuscript  is  included  in 
Appendix  B. 
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Key  research  accomplishments  (Year  1) 

•  Sequenced  ATM  cDNA  from  89  breast  cancer  patients 

•  Identified  4  single  nucleotide  base  changes  resulting  in  an  amino  acid  change  in  the  protein  present  in 
multiple  patients 

•  Established  a  control  bank  of  DNA  from  960  individuals  without  a  cancer  history  for  population 
comparison  studies 

•  Developed  an  allele  specific  oligonucleotide  assay  for  3  single  nucleotide  base  changes  in  the  ATM  gene 

•  Established  a  collaborative  effort  that  demonstrated  the  feasibility  of  using  haplotype  association  studies 
with  the  ATM  gene. 

•  Demonstrated  that  bilateral  breast  cancer  patients  have  an  increased  number  of  radiation-induced 
chromatid  breaks  compared  to  controls  showing  the  feasibility  of  a  phenotype  rather  than  genotype  assay 
to  predict  breast  cancer  risk. 
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Reportable  Outcomes 

-  manuscripts,  abstracts,  presentations: 

1 .  Buchholz  TA,  Wu  XF:  Radiation-induced  chromatid  breaks  as  a  predictor  of  breast  cancer  risk. .  Int  J 
Radiat  Oncol  Biol  Phys,  2000. 

2.  Bonnen  PE,  Story  MD,  Ashom  CL,  Buchholz  TA,  Wiel  MM,  Nelson  DL:  Haplotypes  at  ATM 
identify  coding  sequence  variation  and  indicate  a  region  of  reduced  recombination.  Am  J  Hum  Gen, 
2000. 

3.  Buchholz  TA,  Wu  XF:  Radiation-induced  genomic  instability  as  a  predictor  for  the  risk  of  breast 
cancer  development.  Int  J  Rad  Oncol  Biol  Phys  46(3):766  (#198),  2000. 


-  patents  and  licenses  applied  for  and/or  issued:  none 

-  degrees  obtained  that  are  supported  by  this  award:  none 

-  development  of  cell  lines,  tissue  or  serum  repositories 

1.  established  a  control  bank  of  DNA  from  960  individuals  with  no  cancer  history  to  serve  for 
population  frequency  testing 


—  informatics  such  as  databases  and  animal  models,  etc: 

1.  established  a  database  for  enrolled  patients  and  controls  that  cover  patient  demographics,  cancer 
history,  and  toxicity  from  radiation  treatment. 


-  funding  applied  for  based  on  work  supported  by  this  award:  none 

-  employment  or  research  opportunities  applied  for  and/or  received  on  experiences/training  supported  by 
this  award:  none. 
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Conclusion 


Over  the  first  year  of  this  4-year  program,  we  have  been  successful  in  sequencing  ATM  cDNA  in 
breast  cancer  patients  and  breast  cancer  patients  with  radiation  injury.  Based  on  these  studies,  we  conclude 
that  these  populations  have  multiple  single  nucleotide  base  changes  that  may  have  contributed  to  their  disease 
and  treatment-related  toxicity.  To  explore  this  relationship  further,  we  have  successfully  developed  allele 
specific  oligonucleotide  assays  to  test  population  frequency  of  single  nucleotide  changes  in  patients  and 
ethnically-matched  controls. 

In  addition  to  continuing  the  planned  studies  outlined  in  our  original  statement  of  work  and  body  of 
this  report,  we  hope  to  determine  some  of  the  single  base  changes  result  in  cellular  radiosensitivity  and 
deficiency  in  DNA  damage  repair.  We  have  obtained  institutional  review  board  approval  of  a  protocol  to 
obtain  skin  fibroblasts  from  individuals  in  whom  our  sequencing  studies  have  identified  single  nucleotide 
changes.  The  fibroblasts  will  be  grown  in  culture  for  in  vitro  assays  of  radiosensitivity,  DNA  damage  repair, 
and  biochemistry  assays. 

Establishing  a  relationship  between  ATM  and  breast  cancer  development  and/or  normal  tissue 
toxicity  following  breast  cancer  radiation  treatment  would  be  a  significant  contribution  to  breast  cancer 
research.  If  our  data  identifies  an  association  between  specific  single  nucleotide  changes  and  breast  cancer, 
further  studies  will  be  needed  to  determine  the  excess  risk.  Ultimately,  allele  specific  oligonucleotide  assays 
could  serve  as  a  simple  tool  to  screen  large  populations  of  women  to  further  define  their  breast  cancer  risk 
and  risk  of  treatment-related  toxicity. 
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Haplotypes  at  ATM  Identify  Coding-Sequence  Variation  and  Indicate  a 
Region  of  Extensive  Linkage  Disequilibrium 

Penelope  E.  Bonnen,1  Michael  D.  Story,2  Cheryl  L.  Ashom,2  Thomas  A.  Buchholz,3 
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Genetic  variation  in  the  human  population  may  lead  to  functional  variants  of  genes  that  contribute  to  risk  for 
common  chronic  diseases  such  as  cancer.  In  an  effort  to  detect  such  possible  predisposing  variants,  we  constructed 
haplotypes  for  a  candidate  gene  and  tested  their  efficacy  in  association  studies.  We  developed  haplotypes  consisting 
of  14  biallelic  neutral-sequence  variants  that  span  142  kb  of  the  ATM  locus.  ATM  is  the  gene  responsible  for  the 
autosomal  recessive  disease  ataxia-telangiectasia  (AT).  These  ATM  noncoding  single-nucleotide  polymorphisms 
(SNPs)  were  genotyped  in  nine  CEPH  families  (89  individuals)  and  in  260  DNA  samples  from  four  different  ethnic 
origins.  Analysis  of  these  data  with  an  expectation-maximization  algorithm  revealed  22  haplotypes  at  this  locus, 
with  three  major  haplotypes  having  frequencies  5=.10.  Tests  for  recombination  and  linkage  disequilibrium  (LD) 
show  reduced  recombination  and  extensive  LD  at  the  ATM  locus,  in  all  four  ethnic  groups  studied.  The  most 
striking  example  was  found  in  the  study  population  of  European  ancestry,  in  which  no  evidence  for  recombination 
could  be  discerned.  The  potential  of  ATM  haplotypes  for  detection  of  genetic  variants  through  association  studies 
was  tested  by  analysis  of  84  individuals  carrying  one  of  three  ATM  coding  SNPs.  Each  coding  SNP  was  detected 
by  association  with  an  ATM  haplotype.  We  demonstrate  that  association  studies  with  haplotypes  for  candidate 
genes  have  significant  potential  for  the  detection  of  genetic  backgrounds  that  contribute  to  disease. 


Introduction 

Qualifying  and  quantifying  the  genetic  contribution  to 
the  etiology  of  common  complex  disease  remains  one  of 
the  great  quests  of  modern  medical  genetics.  The  com¬ 
plexity  of  multifactorial  diseases  challenges  the  para¬ 
digms  and  tools  of  conventional  genetic  research.  Tra¬ 
ditional  methods  of  genetic  analysis  do  not  have  the 
statistical  power  or  sensitivity  for  the  task  of  teasing  out 
a  genetic  contribution  when  it  is  subtle  or  when  several 
genes  may  be  working  together  (Risch  and  Merikangas 
1996).  Genomewide  association  studies,  as  well  as  pop¬ 
ulation  studies  with  candidate  genes,  have  been  touted 
as  possible  alternatives  to  linkage  analysis  (Risch  and 
Merikangas  1996;  Collins  et  al.  1997;  Kruglyak  1999; 
Risch  2000).  These  approaches  focus  on  finding  either 
a  causative  variant  or  a  genetic  variant  closely  linked 
with  the  disease  phenotype.  Some  studies  utilizing  single¬ 
nucleotide  polymorphisms  (SNPs)  have  succeeded  in  de¬ 
tecting  the  risk  for  disease,  notably  in  the  case  of  the 
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apolipoprotein  type  E  (apoE)  gene  and  both  coronary 
artery  disease  (Boerwinkle  et  al.  1996)  and  Alzheimer 
disease  (Strittmatter  and  Roses  1995).  These  studies 
were  able  to  directly  assess  the  risk  conferred  by  known 
apoE  functional  variants.  In  some  other  cases,  however, 
the  attempt  to  correlate  single-locus  alleles  with  phe¬ 
notypes  have  produced  mixed  results  (Josefsson  et  al. 
1998;  Kraft  et  al.  1998;  Storey  et  al.  1998). 

Haplotype  association  with  disease  by  the  linkage  dis¬ 
equilibrium  (LD)  approach  has  been  used  successfully 
for  the  identification  of  genomic  regions  containing  loci 
responsible  for  disease  phenotypes  (MacDonald  et  al. 
1992;  Yu  et  al.  1996).  The  same  principle  can  be  applied 
by  use  of  haplotypes  of  biallelic  markers  to  detect  dis¬ 
ease  association.  Using  several  SNPs  distributed  across 
100-200  kb  should  result  in  statistical  sensitivity  that 
is  greater  than  that  in  studies  using  fewer  loci.  Another 
strength  of  such  an  approach  is  the  ability  to  use  purely 
epidemiological  populations  for  detection  of  chromo¬ 
somal  backgrounds  lending  risk  for  disease. 

All  of  these  approaches  are,  to  one  extent  or  another, 
dependent  on  LD.  An  understanding  of  LD  relationships 
between  markers  will  inform  the  efficacy  and  design  of 
future  LD-based  strategies  for  detection  of  genetic  con¬ 
tributions  to  common  disease.  Simulation  studies  have 
estimated  the  length  of  useful  LD  to  be  as  low  as  3  kb 
(Kruglyak  1999).  Recent  investigations  support  the  no- 
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tion  that  LD  varies  throughout  the  genome  (Collins  et 
al.  1999;  Taillon-Miller  et  al.  2000)  and  that  it  can 
extend  to  considerable  lengths,  such  as  several  hundred 
kilobases  (Collins  et  al.  1999;  Eaves  et  al.  2000;  Moffatt 
et  al.  2000;  Taillon-Miller  et  al.  2000).  Reports  of  such 
extreme  differences  indicate  the  need  for  further  study 
of  the  extent  and  nature  of  LD. 

Allelic  variation  leading  to  functional  variants  of 
genes  may  predispose  to  risk  for  seemingly  sporadic 
cases  of  common  disease  (Lander  1996;  Collins  et  al. 
1997).  Here  we  describe  a  strategy  for  exploring  the 
possible  effects  of  functional  variants  of  genes  involved 
in  familial  cancers.  We  use  a  resequencing  approach  to 
detect  SNPs  across  a  large  (184  kb)  genomic  region 
containing  the  ATM  gene.  ATM  is  responsible  for  the 
autosomal  recessive  disease  ataxia-telangiectasia  (A-T) 
(Savitsky  et  al.  1995).  A-T  is  characterized  by  cerebellar 
ataxia,  oculocutaneous  telangiectasia,  immune  defi¬ 
ciency,  sensitivity  to  ionizing  radiation,  increased  inci¬ 
dence  of  tumors,  and  chromosomal  instability  (Gatti  et 
al.  1991).  A-T  heterozygotes  may  be  at  increased  risk 
for  development  of  cancers,  most  prominently — and 
controversially — breast  cancer  (Swift  et  al.  1987,  1991; 
Morrell  et  al.  1990;  Stankovic  et  al.  1998;  Gatti  et  al. 
1999).  With  carrier  frequencies  estimated  to  be  from 
0.5%  to  >1%  (Swift  et  al.  1986;  Gatti  et  al.  1999), 
assessment  of  cancer  risk  for  this  population  is  a  com¬ 
pelling  endeavor.  In  addition,  the  ATM-gene  product  is 
centrally  involved  in  cellular  responses  to  DNA  damage, 
including  DNA  double-strand  break  repair  and  signal¬ 
ing  leading  to  cell-cycle  arrest  and  apoptosis  (reviewed 
in  Rotman  and  Shiloh  1999).  We  genotyped  295  indi¬ 
viduals  from  four  ethnic  groups,  for  14  SNP  markers 
that  spanned  142  kb.  An  expectation-maximization  al¬ 
gorithm  estimated  22  ATM  haplotypes  from  these  data. 
Tests  for  recombination  and  LD  revealed  (a)  no  evidence 
for  recombination  in  the  white  European  American 
study  population  and  (b)  perfect  disequilibrium  extend¬ 
ing  the  full  length  marked  by  these  SNPs.  We  then  con¬ 
ducted  a  model  association  study  with  these  haplotypes 
and  a  population  of  samples  that  possessed  one  of  three 
different  coding  SNPs  (cSNPs)  in  the  ATM  gene.  The 
results  of  this  study  provide  strong  support  for  the  utility 
of  complex  SNP  haplotypes  as  a  means  to  detect  poly¬ 
morphisms  in  a  population-based  sample. 


Subjects  and  Methods 

Human  Subjects 

For  SNP  discovery,  genomic  DNA  from  five  unrelated 
white  European  Americans  was  sequenced.  This  DNA 
was  extracted  from  lymphoblast  and  fibroblast  cell  lines. 


For  SNP  genotyping,  individuals  from  four  ethnic  groups 
were  sampled:  African  American  (n  =  71 ),  Asian  Amer¬ 
ican  ( n  =  39),  white  European  American  (n  =  77),  and 
Hispanic  (n  =  73).  All  ethnic  samples  (self-described 
ethnicity)  were  part  of  a  collection  of  941  DNA  purified 
samples  from  anonymous  blood  donors  in  community- 
based  blood  drives  in  southeastern  and  central  Texas. 
Samples  analyzed  in  the  model  association  study  were 
also  from  this  DNA  collection.  Members  of  nine  CEPH 
families  were  also  analyzed.  In  all  families,  four  grand¬ 
parents,  two  parents,  and  four  children  were  examined; 
since  two  of  these  families  share  a  grandparent,  89  in¬ 
dividuals  were  genotyped,  and  the  number  of  segregating 
chromosomes  is  70. 

Samples  from  Great  Apes 

Six  great-ape  samples  were  genotyped  in  this  study: 
two  from  common  chimpanzees  (Pan  troglodyte ),  one 
from  a  bonobo  (P.  paniscus ),  two  from  western  lowland 
gorillas  ( Gorilla  gorilla ),  and  one  from  an  eastern  low¬ 
land  gorilla  (G.  g.  graueri). 

PCR  and  Sequencing  Primers 

Primers  for  DNA  amplification  and  sequencing  were 
designed  by  MacVector,  version  6.0.1.  The  184-kb  ge¬ 
nomic  sequence  of  ATM  was  masked  for  repetitive  se¬ 
quence,  by  Repeat  Masker.  Thirty-six  primer  sets  were 
designed  to  amplify  regions  containing  little  or  no  repeat 
sequence,  distributed  evenly  throughout  the  sequence. 
Primers  were  selected  that  met  strict  criteria  for  melting 
temperature  and  that  amplified  regions  containing  very 
little  or  no  repeat  sequence.  The  same  primers  were  used 
for  PCR  and  sequencing  reactions  and  are  listed  in  Ap¬ 
pendix  A. 

PCR  Amplification  of  Genomic  DNA 

Genomic  DNA  from  five  unrelated  individuals  was 
amplified  by  means  of  29  of  the  36  primer  sets  men¬ 
tioned  above.  The  50-gl  reactions  included  DNA  (200 
ng),  standard  PCR  buffer,  dNTPs  (0.1  mM  each),  Taq 
(0.5  /d;  Perkin-Elmer),  and  primers  (1  j*M  each).  PCR 
was  performed  in  a  Perkin  Elmer  9700  analyzer,  with 
an  initial  denaturation  at  95°C  for  5  min,  followed  by 
30  cycles  of  95°C  for  30  s,  60°C  for  30  s,  and  72°C  for 
30  s,  and  a  final  step  at  72°C  for  7  min.  For  all  ampli- 
cons,  6  (i\  of  PCR  product  was  run  on  a  1.5%  agarose 
gel. 

DNA  Sequencing 

PCR  products  were  purified  and  sequenced.  Prepa¬ 
ration  of  DNA  for  sequencing  included  incubation  of 
-60  ng  of  PCR  product  with  shrimp  alkaline  phospha- 
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tase  (2  U;  Amersham)  and  exonuclease  I  (10  U;  Amer- 
sham)  at  37°C  for  15  min,  followed  by  enzymatic  in¬ 
activation  at  80°C  for  15  min.  Sequencing  of  each  PCR 
product  was  performed  with  the  Thermo  Sequenase 
[33P] -radiolabeled  terminator-cycle  sequencing  kit  (Am¬ 
ersham  Pharmacia),  according  to  the  manufacturer’s  in¬ 
structions.  Sequencing  reactions  were  performed  in  a 
Perkin  Elmer  9700  analyzer,  with  an  initial  denaturation 
at  95°C  for  1  min,  followed  by  35  cycles  of  95°C  for  30 
s,  55°C  for  30  s,  and  72°C  for  1  min.  Samples  were  run 
on  6%  polyacrylamide  gels,  fixed  for  15  min  in  5% 
acetic  acid/20%  methanol,  and  dried. 

Multiplex  PCR 

Sequencing  revealed  17  SNPs  in  15  different  regions 
of  the  gene.  These  15  PCR  amplicons  were  multiplexed 
into  two  PCR  reactions.  Multiplex  group  8  amplifies 
eight  fragments,  and  Multiplex  group  7  amplifies  seven 
fragments.  The  50-/xl  reactions  for  group  7  included 
DNA  (400  ng),  standard  PCR  buffer  (2  x  ),  dNTPs  (0.2 
mM  each),  and  Taq  (0.5  /d;  Perkin-Elmer).  The  50-/d 
reactions  for  group  8  included  DNA  (400  ng),  standard 
PCR  buffer  (1.8  x  ),  dNTPs  (0.2  mM  each),  and  Taq 
(0.5  fil;  Perkin-Elmer).  Primers  include  some  of  those 
originally  designed  for  sequencing  and  some  of  those 
newly  designed  to  alter  the  size  of  the  amplicons.  Prod¬ 
ucts  were  separated  by  5=20  bp,  so  that  they  could  be 
resolved  from  one  another  on  a  2.5%  agarose  gel.  Mul¬ 
tiplex  PCRs  were  checked  to  have  amplified  all  products, 
by  running  6  /d  of  product  on  a  2.5%  agarose  gel.  The 
concentrations  and  primer  sequences  used  for  PCR  are 
listed  in  Appendix  B. 

Allele-Specific  Oligonucleotide  (A SO)  Hybridizations 

Genotypes  for  each  SNP  were  determined  in  all  sample 
populations,  by  ASO  hybridizations.  ASO  hybridiza¬ 
tions  were  performed  as  described  by  DeMarchi  et  al. 
(1994).  We  performed  ASO  hybridization  for  14  SNPs 
for  each  individual  typed.  These  14  SNPs  were  chosen 
from  the  original  17  because  they  perform  consistently 
well  under  standard  ASO-hybridization  conditions.  Hy¬ 
bridizations  were  performed  under  conditions  that  al¬ 
lowed  for  annealing  of  only  the  probe  that  is  an  exact 
match  for  the  substrate  DNA.  Genotypes  for  SNPs  were 
read  on  at  least  two  independent  occasions.  The  se¬ 
quences  of  the  ASO-hybridization  probes  are  listed  in 
Appendix  C. 

Estimation  of  Haplotypes  and  Frequencies 

Haplotypes  and  their  frequencies  were  estimated  on 
the  basis  of  unphased  genotype  data,  by  the  computer 
program  EMHAPFRE.  Described  in  the  work  of  Ex- 
coffier  and  Slatkin  (1995),  EMHAPFRE  uses  an  expec¬ 
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tation-maximization  algorithm  that  determines  the  max¬ 
imum-likelihood  frequencies  of  multilocus  haplotypes  in 
diploid  populations.  Only  individuals  who  were  scored 
for  all  14  SNPs  were  included  in  the  data  analysis. 

Haplotype  Assignment  to  Genotype  Data 

A  short  script  written  in  Microsoft  Excel  Visual  Basic 
and  named  “Assign”  was  used  to  assign  genotypes  to 
individual  samples.  The  script  was  given,  as  input,  the 
list  of  haplotypes  produced  by  EMHAPFRE  and  the  raw 
unphased  genotype  data.  It  produces  a  list  of  samples 
input,  with  a  pair(s)  of  haplotypes  that  satisfies  the  ge¬ 
notype  data  assigned  to  each;  in  cases  in  which  multiple 
pairs  of  haplotypes  were  listed,  one  pair  is  chosen,  by 
use  of  a  haplotype  frequency-based  method.  A  proba¬ 
bility  is  calculated  for  each  haplotype  pair,  by  multipli¬ 
cation  of  the  haplotypes’  frequencies  in  the  control  pop¬ 
ulation.  The  haplotype  pair  with  the  highest  probability 
is  assigned  to  the  individual. 

Statistical  Analysis  for  Recombination  and  LD 

To  test  for  recombination,  we  used  the  four-gamete 
test  and  the  Hudson  and  Kaplan  (1985)  recombination 
statistic,  R.  For  a  given  haplotype  AB,  mutation  may 
result  in  either  Ab  or  aB.  Haplotype  ab  arises  only  in 
the  case  of  either  recombination  or  repeat  mutation.  The 
four-gamete  test  was  executed  on  unphased  genotype 
data,  in  a  pairwise  fashion,  across  all  SNP  loci.  On  the 
basis  of  the  resulting  matrix  of  the  four-gamete  test,  R 
estimates  the  location  and  number  of  recombination 
events  that  have  occurred  in  the  sample. 

Initial  LD  analysis  was  computed  by  performance  of 
pairwise  comparisons  for  all  SNP  loci.  Fisher’s  exact  test 
was  used  to  determine  significance  levels.  SNPs  having 
a  minor-allele  frequency  of  .05  were  excluded  from  LD 
analyses.  LD  statistic  D  is  a  pairwise  comparison  of  ga¬ 
metic  frequencies  such  that  D  =  pllp22-pl2p21.  D[ , 
the  relative  disequilibrium,  is  D'  —  D/|D|max,  where 
|D|max  =  max(plp2,qlq2)  if  D  <  0  and  |D|max  = 
min(qlp2,plq2)  if  D  >  0.  D'  ranges  from  1  to  —1,  and 
this  range  is  not  influenced  by  allele  frequency. 

All  recombination  and  LD  statistics  were  generated 
by  the  software  program  DnaSP  3.00  (written  by  J.  Ro¬ 
zas  and  R.  Rozas,  University  of  Barcelona). 

Statistical  Analysis  for  Association  Study 

Testing  for  significance  in  the  model  association  study 
was  done  by  use  of  contingency  tables  for  independence. 
P  values  for  significance  of  association  at  the  haplotype 
level  were  determined  by  use  of  2  x  2  tables  and  3x3 
tables  for  the  genotype  level.  Significance  values  refer  to 
a  one-sided  test. 
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Figure  1  Schematic  of  ATM.  The  184  kb  of  the  ATM  locus  is  illustrated,  with  the  64  exons  represented  by  black  boxes.  Twenty-nine 
-500-bp  regions  were  amplified  by  PCR  in  five  unrelated  individuals.  These  regions  were  sequenced  and  found  to  contain  17  SNPs. 


Results 

SNP  Discovery 

Our  initial  objective  was  to  discover  common  neutral 
sequence  variants  spanning  the  length  of  the  ATM  gene. 
A  gel-based  resequencing  strategy  was  employed  to  de¬ 
tect  SNPs  at  the  ATM  locus.  Genomic  DNA  of  five  un¬ 
related  individuals  was  amplified,  by  PCR,  for  [33P]~ 
radiolabeled  sequencing.  For  detection  of  markers 
spanning  the  entire  locus,  PCR  primers  were  designed 
for  amplicons  dispersed  approximately  evenly  through¬ 
out  the  184-kb  genomic  region  containing  the  gene  (fig. 
1).  Approximately  13.5  kb  of  the  184-kb  total  sequence 
was  read  in  each  individual.  The  nucleotide  diversity,  7r, 
calculated  for  this  sequence  data  was  .00057.  Seventeen 
SNPs  were  found,  which  span  142  kb  and  all  of  which 
are  located  in  introns  (table  1).  This  yielded  an  average 
of  1  SNP/794  nucleotides  sequenced. 

Genotyping  and  Haplotype  Development 

To  begin  construction  of  haplotypes  from  these  SNPs, 
we  genotyped  nine  three-generation  CEPH  families 
(Dausset  et  al.  1990).  By  using  three-generation  families, 
we  could  determine  haplotypes  from  genotype  data, 
through  inference.  This  allowed  us  both  to  determine 
the  efficacy  of  the  computer  algorithm  used  to  predict 
haplotypes  (see  below)  and  to  optimize  our  genotyping 
assay.  We  performed  ASO  hybridization  on  nine  CEPH 
families  (89  individuals;  70  chromosomes),  for  14  of  the 
original  17  SNPs.  These  14  SNPs  were  chosen  from  the 


original  17  because  they  performed  consistently  well  un¬ 
der  standard  ASO-hybridization  conditions. 

We  then  used  two  different  methods  for  deciphering 
the  haplotypes  derived  from  the  genotype  data,  in  a  side- 
by-side  comparison.  First,  haplotypes  were  inferred  by 

Table  1 


Seventeen  ATM  Noncoding  SNPs  Detected  by  Resequencing 


SNP 

Location  in  Genomic  Sequence  with 
GenBank  Accession  Number  U82828 

Prior  to  5TJTR  t-»ab 

10182 

rVS8-356t-*c 

34293 

IVS19-1276a^g 

57469 

IVS21-77tr>c 

60136 

IVS26+491c-»gc 

71049 

rVS27-193c~*tc 

75083 

IVS34+754g-*a 

85811 

IVS46— 257a-*c 

112721 

IVS55+186c-t 

121819 

IVS57+3570t-c 

127195 

IVS58+997g-a 

132032 

IVS59-l-414g-»tc 

133986 

IVS61— 55t-*c 

142611 

IVS62+60g->a 

142789 

IVS62+424g^a 

143153 

IVS62-973a-N: 

151964 

IVS62-694c-a 

152243 

a  Nomenclature  is  according  to  the  guidelines  recorded  by 
the  Ad  Hoc  Committee  on  Mutation  Nomenclature  (1996). 

b  This  SNP  is  named  in  reference  to  the  genomic  sequence 
having  GenBank  accession  number  U82828  because  of  the 
highly  variable  nature  of  the  5TJTR. 
c  Not  used  in  genotyping  or  haplotype  analysis. 
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Table  2 


ATM  Haplotypes  of  295  Humans  from  Five  Ethnic  Groups  and  of  Three  Species  of  Great  Apes 

Frequency  in  Humans3 


Haplotype 

Sequence 

Overall 
(«  =  295) 

African 
American 
(n  =  71) 

Asian 
American 
(n  =  39) 

White 
European 
American 
(n  =  77) 

Hispanic 
American 
(n  =  73) 

CEPH 
{n  =  35) 

1 

ACTCTACTTCCCTC 

.002 

.007 

2b 

ACrCTACTTCTTTC 

.313 

.190 

.500 

.292 

.315 

.394 

3 

ACTCTCCTTCTITC 

.037 

.065 

.048 

.061 

4 

ACTTCACTCCTCTC 

.002 

.007 

5 

ACTTTACTCTCCTCC 

.002 

6b 

ACTTTACTTCCCTC 

.066 

.218 

.013 

.013 

.027 

.015 

7 

ACTTTACTTCTTTC 

.019 

.077 

8 

ATT  CTACTT  CTTT  C 

.012 

.007 

.051 

.007 

9 

ATTCTCCTTCTTTC 

.000 

.013 

10 

ATTTCACTCCCCTC 

.002 

.007 

11 

ATTTCATCCTCCCC 

.002 

.013 

12 

TCTCTACTTCTTTC 

.007 

.021 

.015 

13 

TCTTCACTCTCCTC 

.010 

.035 

.007 

14 

TCTTCATCCTCCCC 

.002 

.007 

15b 

TTCTCACTCTCCTA 

.090 

.028 

.013 

.175 

.041 

.227 

16 

TTTCTATCCTCCCC 

.005 

.017 

.007 

17b 

rnrcACCCTCCTC 

.100 

.141 

.068 

.097 

.110 

.015 

18 

TTTTCACCCTCTTC 

.002 

.007 

19 

TTTTCACTCCTTTC 

.002 

.007 

20 

TTTTCACTCTCCTA 

.002 

.007 

21b 

TTTTCACTCTCCTC 

.048 

.162 

.013 

.006 

.027 

22b 

TTTTCATCCTCCCC 

.277 

.113 

.291 

.351 

.363 

.273 

TTTCTACCCTCCTCC 

.009 

ACTTTACCCTCCTCC 

.007 

Total 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

Frequency  in  Great  Apes 

d 

Chimpanzee  { n  —  2) 

Bonobo  {n  =  1) 

Gorilla  (n  =  3) 

1 

TCTTTACTCTCCTC 

.750 

1.000 

.000 

2 

TCTTTACTCTCTTC 

.250 

.000 

.000 

3 

TATTTACTCTCCTC 

.000 

.000 

1.000 

a  Samples  were  genotyped  by  ASO  hybridization,  then  haplotypes  and  their  frequencies  were  estimated  from  unphased  genotype  data,  by 
the  EM  algorithm  EMHAPFRE. 
b  Haplotype  present  in  all  four  ethnic  groups  studied. 

c  Low-frequency  haplotypes  in  which  some  differences  were  seen  in  the  combined  data  set  and  in  individual  ethnic  populations. 
d  Samples  were  genotyped  by  ASO  hybridization  and  fluorescent  sequencing. 


hand.  We  began  with  homozygotes  and  predicted  other 
haplotypes  on  the  basis  of  transmission  and  by  estab¬ 
lishing  the  phase  through  the  pedigrees.  Seven  haplo¬ 
types  were  identified  in  the  sample  of  CEPH  families. 
Subsequently,  we  subjected  the  same  data  set  to  an  ex¬ 
pectation-maximization  algorithm,  to  estimate  haplo¬ 
types  and  their  frequencies.  The  computer  program  EM¬ 
HAPFRE  is  a  maximum-likelihood  program  developed 
to  predict  multilocus  haplotypes  from  unphased  geno¬ 
type  data  (Excoffier  and  Slatkin  1995).  It  produces  both 
a  list  of  haplotypes  and  their  estimated  frequencies  in 
the  input  sample  population.  The  haplotype  predictions 
from  EMHAPFRE  were  in  complete  accordance  with 
those  that  had  been  inferred  manually,  giving  us  confi¬ 


dence  that  this  program  was  suitable  for  data  of  this 
nature. 

Haplotype  and  Allele  Frequencies 

To  determine  frequencies  of  haplotypes  and  of  indi¬ 
vidual  SNPs  in  different  ethnic  populations,  we  per¬ 
formed  ASO  hybridization  on  anonymous  African 
American  (n  =  71),  Asian  American  (n  =  39),  white 
European  American  (n  =  77),  and  Hispanic  (n  =  73) 
DNA  samples  collected  in  central  and  southeastern 
Texas.  Genotype  data  were  analyzed  by  the  EMHAPFRE 
program.  For  the  total  population,  22  haplotypes  and 
their  frequencies  were  predicted  by  EMHAPFRE  (table 
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IVS57+3570T->C  IVS58+997G^A  IVS61-55T->C  IVS62+60G->A  IVS62+424G->A  IVS62-973A->C  IVS62-694C-»A 


Figure  2  ATM  SNP  allele  frequencies  for  14  ATM  SNPs  in  each  of  four  ethnic  groups.  A  total  of  260  individuals  (71  African  American, 
39  Asian  American,  77  white  European  American,  and  73  Hispanic)  were  genotyped  by  ASO  hybridization. 


2).  Three  predominant  haplotypes  were  found  at  fre¬ 
quencies  5=10%.  An  independent  study  that  examined 
neutral  sequence  variants  at  the  ATM  locus  also  found 
three  major  haplotypes  (Li  et  al.  1999). 

The  majority  of  SNPs  identified  in  this  study  have  a 
frequency,  in  all  ethnic  groups,  of  5=25%  (fig.  2).  Of 
the  14  SNPs,  3  (IVS19-1276a^g,  IVS46-257a^c, 
and  IVS62-694c->a)  have  a  minor-allele  frequency  of 
<10%  in  most  ethnic  groups.  SNP  frequencies  vary 
across  ethnic  groups.  Three  SNPs  (IVS55+186c— >t, 
IVS62+424g-a,  and  IVS62-973a^c)  have  a  frequency 
of  11%  in  African  Americans  while  being  present  at  a 
frequency  of  >30%  in  all  other  ethnic  groups.  SNP 
3VS46—257a— >c  was  not  found  in  the  samples  from  Af¬ 
rican  Americans.  Of  the  three  low-frequency  SNPs,  two 
(IVS19— 1276a->g  and  IVS62-694c->a)  have  a  fre¬ 
quency  of  >18%  in  the  white  European  American  pop¬ 
ulation  and  of  <6%  in  the  others.  This  is  not  surprising, 
given  that  the  original  five  samples  used  for  SNP  detec¬ 
tion  were  white  European  Americans. 

To  begin  to  describe  the  haplotype  phylogeny  at  the 
ATM  locus,  we  wanted  to  determine  what  haplotypes 
were  present  in  each  ethnic  population.  The  genotype 
data  were  analyzed,  by  EMHAPFRE,  as  four  separate 
data  sets  segregated  by  ethnic  group.  However,  this  anal¬ 


ysis  led  to  small  discrepancies  from  what  was  predicted 
from  the  complete  data  set.  In  each  case,  changes  were 
found  in  the  lowest-frequency  haplotypes  (table  2).  The 
efficacy  of  EMHAPFRE  is  known  to  decay  as  data  sets 
decrease  in  size  (Excoffier  and  Slatkin  1995).  Thus,  a 
second  approach  to  ascription  of  haplotypes  and  their 
frequencies  to  each  ethnic  group  was  taken.  To  this  end, 
a  simple  script  was  written  in  Microsoft  Excel  Visual 
Basic.  This  script,  named  “Assign,”  takes  a  list  of  hap¬ 
lotypes  and  a  data  set  of  unresolved  genotypes  and  then 
assigns  to  each  individual  sample  one  or  more  pairs  of 
haplotypes  that  can  resolve  its  genotype  data;  Assign  lists 
every  pair  of  haplotypes  that  can  resolve  an  individual’s 
genotype  data.  We  input  each  ethnic  group’s  data  set 
individually  with  the  22  haplotypes.  In  this  way  we  were 
able  to  determine  which  of  the  haplotypes  suggested  by 
EMHAPFRE  were  necessary  for  resolution  of  our  ge¬ 
notype  data,  thus  further  refining  the  results.  The  ge¬ 
notype  of  every  sample  in  this  study  could  be  accounted 
for  by  at  least  one  pair  of  the  22  haplotypes  predicted 
by  EMHAPFRE  from  the  complete  data  set.  Six  of  the 
22  haplotypes  exist  in  all  ethnic  populations,  and  1 1  of 
them  are  unique  to  a  single  population  and  hereafter  are 
referred  to  as  “private”  haplotypes  (table  2);  each  of 
these  11  haplotypes  has  a  frequency  of  <1%. 
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Figure  3  Four-gamete  test  for  recombination  in  ATM .  White  boxes  denote  site  pairs  having  four  gametic  types,  which  implies  that 
recombination  has  occurred  between  these  two  sites.  Also  shown  is  the  Hudson  and  Kaplan  recombination  statistic  R,  which  is  an  estimate  of 
the  number  and  sites  of  recombination  events  needed  to  explain  the  results  of  the  four-gamete  matrix.  A  white  box  containing  an  “  x  ”  denotes 
a  potential  site  of  recombination.  The  asterisk  (*)  denotes  an  SNP  that  is  not  polymorphic  in  the  sample  population. 


We  analyzed  primate  DNA  in  order  to  approximate 
an  ancestral  ATM  haplotype.  Three  haplotypes  were 
found  in  12  chromosomes  (table  2).  Two  common  chim¬ 
panzees,  one  bonobo,  and  three  gorillas  were  genotyped 
by  ASO  hybridization  and  fluorescent  sequencing;  in 
cases  in  which  ASO  hybridization  gave  ambiguous  re¬ 
sults,  fluorescent  sequencing  was  used  to  confirm  the 
genotype.  None  of  the  ape  haplotypes  was  found  among 
the  22  human  haplotypes.  One  ape  haplotype  differs 
from  a  human  haplotype  by  a  single-base  variant.  This 
human  haplotype  is  one  of  the  least  common  (frequency 
.007)  and  occurs  only  in  our  African  American  study 
group.  Only  one  of  the  human  SNPs  showed  variation 
in  the  apes;  the  remainder  were  monomorphic.  One  com¬ 
mon  chimpanzee  was  heterozygous  for  IVS62+424g-),a. 
The  gorillas  shared  all  but  one  allele  with  the  chimpan¬ 
zees.  At  IVS8-356t-*c,  gorillas  are  homozygous  for  a 
third  allele  (A),  which  is  not  found  in  either  humans  or 
chimpanzees. 

Intragenic  Recombination  and  LD 

The  small  number  of  haplotypes  seen  in  our  study 
population  suggests  the  possibility  that  recombination 


is  reduced  at  the  ATM  locus.  This  is  further  evidenced 
by  the  results  of  the  four-gamete  test  (fig.  3)  (Hudson 
and  Kaplan  1985).  For  a  given  haplotype  AB,  muta¬ 
tion  may  result  in  either  Ab  or  aB.  Haplotype  ab  arises 
only  in  the  case  of  either  recombination  or  repeat  mu¬ 
tation.  For  the  purpose  of  this  analysis,  we  will  consider 
repeat  mutation  to  be  rare  and  will  use  the  four-gamete 
test  as  a  measure  of  recombination.  The  four-gamete  test 
was  executed  on  unphased  genotype  data,  in  a  pairwise 
fashion  across  SNP  loci.  This  was  done  for  each  ethnic 
group  separately.  Interestingly,  the  four-gamete  test 
found  no  site  pairs  with  four  gametes  in  the  samples 
from  white  European  Americans,  implying  a  complete 
lack  of  recombination  in  that  population.  Low  recom¬ 
bination  was  indicated  for  the  other  groups,  as  shown 
in  figure  3. 

Another  test  for  recombination  is  that  of  Hudson  and 
Kaplan  (1985).  Based  on  the  resulting  matrix  of  the  four- 
gamete  test,  the  Hudson  and  Kaplan  parameter  R  is  an 
estimate  of  the  minimum  number  of  recombination 
events  in  the  history  of  the  sample.  For  the  white  Eu¬ 
ropean  American  population,  this  estimate  is  0  (fig.  3). 
For  the  other  ethnic  groups,  R  ranges  from  4,  in  His- 
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Figure  4  Fisher’s  exact  test  for  LD  in  ATM.  White  boxes  denote  site  pairs  that  do  not  have  a  significant  value  by  Fisher’s  exact  test, 
indicating  linkage  equilibrium.  Gray  columns  and  rows  denote  SNPs  that  have  a  minor-allele  frequency  ^.05.  The  asterisk  (*)  denotes  an  SNP 
that  is  not  polymorphic  in  the  sample  population. 


panics,  to  2,  in  Asian  Americans.  The  predicted  sites  of 
recombination  are  similar  among  ethnic  groups.  African 
Americans  and  Hispanics  share  two  possible  recombi¬ 
nation  sites  in  the  5'  end  of  the  gene,  and  a  third,  in  the 
3'  end,  could  also  be  in  the  same  location.  The  Asian 
American  population  shares  one  of  the  5'  end  sites  and 
has,  in  the  middle  of  the  gene,  another  potential  site  of 
recombination,  which  is  also  present  in  Hispanics. 

Further  support  for  the  hypothesis  that  there  is  min¬ 
imal  recombination  at  the  ATM  locus  is  provided  by  the 
results  of  Fisher’s  exact  test  (Weir  1996).  We  computed 
all  possible  pairwise  comparisons  between  sites,  to  de¬ 
termine  the  degree  of  nonrandom  association  between 
sites.  The  majority  of  site  pairs  across  all  data  sets  show 
significance  ( P  <  .001),  indicating  that  there  is  extensive 
disequilibrium  at  this  locus  (fig.  4).  It  has  been  dem¬ 
onstrated  that  alleles  with  frequencies  .05  do  not  have 
the  power  for  detection  of  disequilibrium  (Lewontin 
1995;  Goddard  et  al.  2000).  In  this  analysis,  we  included 
only  SNPs  having  an  allele  frequency  >.05.  The  Hispanic 
and  Asian  American  populations  were  in  complete  dis¬ 
equilibrium.  In  the  white  European  American  popula¬ 
tion,  the  pattern  of  equilibrium  followed  the  SNP  with 
the  lowest-frequency  (.06)  allele. 


Disequilibrium  was  next  measured  by  use  of  the  sta¬ 
tistic  D' ,  in  a  pairwise  fashion  across  the  14  SNP  loci 
(fig.  5).  D'  =  D/|D|max,  where  D  =  pll  —  plp2  and 
|D|max  =  max(plp2,qlq2)  if  D  <  0  and  |D|max  = 
min(qlp2,plq2)  if  D  >  0.  D'  ranges  from  1  to  —  1,  and 
this  range  is  not  influenced  by  allele  frequency.  A  score 
of  either  1  or  - 1  is  considered  to  represent  perfect  dis¬ 
equilibrium.  Interestingly,  the  results  of  this  test  are  vir¬ 
tually  superimposable  on  the  results  of  the  four-gamete 
test.  The  majority  of  site  pairs  are  in  perfect  disequilib¬ 
rium.  The  white  European  American  population  is  in 
perfect  disequilibrium  across  all  sites.  For  the  other 
groups,  the  sites  with  \Df\  <  1  are  exactly  the  same  sites 
that  have  four  gametes.  We  conclude  that  the  ATM  locus 
exhibits  reduced  recombination  and  extensive  disequi¬ 
librium  in  all  four  ethnic  groups,  with  the  white  Euro¬ 
pean  American  population  being  the  most  extreme  case. 

Association  Study 

Ultimately,  we  aim  to  use  these  ATM  haplotypes  for 
association  studies  in  populations  with  cancer.  To  eval¬ 
uate  the  potential  that  these  haplotypes  have  for  iden¬ 
tification  of  a  particular  mutation  or  polymorphism,  we 
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Figure  5  D',  measured  as  U  =  D/|D|max  in  a  pairwise  fashion  across  14  SNP  loci.  A  score  of  either  1  or  —1  is  considered  perfect 

disequilibrium.  Black  boxes  denote  site  pairs  with  perfect  disequilibrium;  white  boxes  denote  site  pairs  with  \D'\  <1.  The  asterisk  {*)  pair 
denotes  an  SNP  that  is  not  polymorphic  in  the  sample  population. 


performed  a  model  association  study.  We  tested  the  abil¬ 
ity  of  these  haplotypes  to  detect,  by  association,  three 
different  cSNPs  in  the  ATM  gene.  These  cSNPs  were 
found  by  sequencing  the  reverse  transcriptase-PCR 
products  from  ATM  mRNA  isolated  from  peripheral 
blood  lymphocytes  from  cancer  patients.  cSNPl  is  lo¬ 
cated  in  exon  4  and  results  in  Ser49Cys.  Positioned  in 
exon  38,  cSNP2  results  in  Aspl853Asn;  and  cSNP3, 
which  results  in  Prol054Arg,  is  located  in  exon  23.  A 
population  of  941  individuals  was  screened  for  these 
three  cSNPs,  by  ASO  hybridization.  The  resulting  fre¬ 
quencies  of  the  cSNPs  in  this  population  are  shown  in 
table  3. 

In  the  model  association  study,  samples  from  white 
European  Americans  in  the  941 -individual  collection 
that  were  found  to  possess  one  of  the  three  cSNPs  were 
considered  to  be  the  “case”  population.  The  “control” 
population  consisted  of  samples  from  white  European 
Americans  from  the  same  collection  that  were  randomly 
chosen  and  negative  for  the  cSNPs;  because  of  the  low 
frequency  of  these  cSNPs  in  other  ethnic  groups,  only 
samples  from  white  European  Americans  were  used  in 
this  association  study.  All  case  and  control  samples  were 
genotyped  for  the  14  ATM  neutral  sequence  variants, 


via  ASO  hybridization.  To  assign  haplotypes  to  individ¬ 
ual  samples,  we  used  Assign  and  the  initial  22  ATM 
haplotypes. 

Each  cSNP  showed  a  significant  association  with  a 
different,  specific  ATM  haplotype  (table  4).  cSNPl 
showed  an  association  with  haplotype  2,  cSNP2  with 
haplotype  15,  and  cSNP3  with  haplotype  17.  Haplotype 
2  was  present  at  a  frequency  of  .29  in  the  control  pop¬ 
ulation  (no.  of  chromosomes  [c\  =  152)  and  at  a  fre¬ 
quency  of  .64  in  the  cSNPl  population  (c  =  14);  hap¬ 
lotype  15  was  present  at  a  frequency  of  .07  in  the  control 
population  (c  =  112)  and  at  a  frequency  of  .57  in  the 
cSNP2  population  ( c  =  56);  and  haplotype  17  was  pre- 


Table  3 

Frequencies  for  Three  ATM  cSNPs  in  the  Control  Population 

Frequency  in  941  Individuals 


White 

European 

Americans 

African 

Americans 

Asian 

Americans 

Hispanic 

Americans 

cSNPl 

.005 

.000 

.001 

.001 

cSNP2 

.066 

.001 

.002 

.017 

cSNP3 

.015 

.001 

.000 

.005 

000 
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Table  4 

Association  of  ATM  Haplotypes  and  ATM  cSNPs  in  individual  "Case"  and  Control  Populations 


Frequency  in3  P  for 

Control  “Case”  Genotype  Haplotype 

Haplotype  Population  Population  Association1*  Association' 

cSNPl  2  .29  (c  =  152)  .64  (c  =  14)  .0478  .0166 

cSNP2  15  .07  {c  =  112)  .57  (c  =  56)  .0000  .0000 

cSNP3  17  .08  (c  =  14 6)  .52  (c  =  54) _ .0000 _ -0000 

Note. — Samples  carrying  one  of  three  ATM  cSNPs  were  genotyped,  by  ASO  hybridization,  for 
the  14  ATM  noncoding  SNPs. 

3  Each  cSNP  was  found  to  occur  on  separate  ATM  haplotypes.  c  -  total  number  of  white 
European  American  chromosomes  genotyped. 
b  By  2  x  2  contingency  table. 
c  By  3  x  3  contingency  table. 


sent  at  a  frequency  of  .08  in  the  control  population 
(c  =  146)  and  at  a  frequency  of  .52  in  the  cSNP3  pop¬ 
ulation  (c  =  54).  These  are  2-fold,  8-fold,  and  6.5-fold 
increases  in  the  frequencies  of  haplotypes  2,  15,  and  17, 
respectively;  and  the  P  values  for  these  associations  are 
.0166,  .0000,  and  .0000,  respectively.  Genotype  corre¬ 
lations  were  also  present,  with  P  values  of  .0478,  .0000, 
and  .0000,  respectively  (table  4). 

One  of  the  great  challenges  in  studying  the  genetics 
of  a  complex  disease  such  as  cancer  is  its  multifactorial 
etiology.  As  presented  in  table  4,  the  data  from  our  sim¬ 
ulated  association  study  model  a  scenario  in  which  all 
“cases”  are  caused  by  a  single  mutation.  To  more  ac¬ 
curately  simulate  an  association  study  with  a  complex 
disease,  we  reanalyzed  our  data.  We  considered  the  three 
groups  of  samples  carrying  the  variant  cSNPs  as  one 
“case”  population.  In  this  analysis,  two  of  the  three  hap¬ 
lotypes  that  originally  had  shown  an  association  dem¬ 
onstrated  a  significant  increase  in  frequency  (table  5). 
No  increase  in  frequency  was  apparent  for  haplotype  2, 
which  had  previously  shown  a  twofold  increase  in  the 
cSNPl  population;  haplotype  15  showed  a  fourfold  in¬ 
crease  (P  =  .0002);  and  haplotype  17  showed  a  three¬ 
fold  increase  (P  =  .0002).  Thus,  we  successfully  dem¬ 
onstrated  the  ability  of  these  ATM  haplotypes  to  discern 
members  of  our  case  population  who  carry  a  particular 
SNP.  The  results  of  these  studies  indicate  a  significant 
potential  for  the  use  of  haplotypes  extending  over  a  large 
genomic  region,  to  detect  disease  associations  through 
a  case-control-study  design  in  a  general  population. 

Discussion 

In  this  study,  we  have  presented  a  strategy  for  uncovering 
the  genetic  contribution  to  complex  disease.  Specifically, 
we  have  demonstrated  the  utility  of  a  complex  SNP- 
based-haplotype  approach  to  association  studies  and 
have  detected  significant  LD  at  the  ATM  locus,  extending 
—142  kb.  The  results  of  this  study  provide  proof  of  prin¬ 


ciple  for  the  use  of  SNP-haplotype  data  in  the  detection 
of  genetic  factors  contributing  to  complex  disease. 

We  sequenced  13.5  kb  of  the  ATM  gene  in  five  un¬ 
related  individuals  and  detected  1 7  SNPs  in  noncoding 
regions.  We  then  utilized  these  neutral  sequence  variants 
spanning  142  kb  of  the  ATM  gene  to  construct  hap¬ 
lotypes  for  this  genomic  locus.  The  expectation-maxi¬ 
mization  algorithm  EMHAPFRE  (Excoffier  and  Slatkin 
1995)  was  used  to  predict  haplotypes  from  genotype 
data  on  295  individuals  from  four  ethnic  groups. 
Twenty-two  haplotypes  and  their  frequencies  were  pre¬ 
dicted  by  EMHAPFRE,  for  the  total  population.  Three 
of  these  22  haplotypes  have  a  frequency  of  2*10%.  This 
concurs  with  the  findings  of  Li  et  al.  (1999),  who  also 
used  neutral  sequence  variants  to  detect  three  major 
haplotypes  at  the  ATM  locus.  Six  of  the  22  haplotypes 
exist  in  all  four  ethnic  populations  in  our  study  and  are 
also  the  most  commonly  occurring  haplotypes.  There 
are  1 1  private  haplotypes,  each  of  which  has  a  frequency 
of  <1  %. 

We  verified  the  reliability  of  the  haplotype-prediction 
algorithm  by  using  several  tests.  First,  we  genotyped 
individuals  from  nine  three-generation  CEPH  families 
(n  =  87).  This  allowed  us  to  determine  haplotypes  by 
inspection  of  allele  segregation.  The  CEPH  genotype 
data  were  also  analyzed  by  EMHAPFRE,  and  the  re¬ 
sulting  haplotypes  agreed  completely  with  those  in- 


Table  5 

Association  of  ATM  Haplotypes  and  ATM  cSNPs  in  Combined 
Case  Population 


Haplotype 

Frequency  in 
Combined 

Case  Population 
(c  =  124) 

P  FOR 

Genotype 

Association 

Haplotype 

Association 

2 

.19 

.9644 

.7606 

15 

.27 

.0000 

.0002 

17 

.24 

.0000 

.0002 

Note. — See  footnotes  to  table  4. 
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ferred  on  the  basis  of  transmission  data.  Next,  we  used 
Assign,  a  script  written  in  Microsoft  Excel  Visual  Basic, 
to  assign  pairs  of  haplotypes  to  individual  genotypes. 
Given  the  22  haplotypes  predicted  by  EMHAPFRE,  As¬ 
sign  successfully  resolved  the  genotype  data  for  all  295 
individuals  in  this  study.  The  results  of  EMHAPFRE 
were  tested  against  another  haplotype-prediction  pro¬ 
gram,  one  that  does  not  use  the  expectation-maximi¬ 
zation  algorithm  and  that  does  not  assume  that  Hardy- 
Weinberg  is  in  effect.  This  program,  termed  “Data 
Mining,”  uses  the  resulting  matrix  of  the  four-gamete 
test  to  inform  the  process  of  haplotype  prediction  so 
that  recombination  may  influence  outcome  (N.  Wang, 
R.  Chakraborty,  M.  Kimmel,  and  L.  Jin,  personal  com¬ 
munication).  There  were  minor  differences  in  the  results 
of  this  comparison.  For  the  population  of  white  Eu¬ 
ropean  Americans,  the  outcome  of  each  program  was 
identical.  This  is  not  surprising,  since  the  four-gamete 
test  reveals  no  evidence  for  recombination  in  this  pop¬ 
ulation.  The  results  of  these  trials  confirm  that  EM¬ 
HAPFRE  was  successful  in  estimating  the  correct  hap¬ 
lotypes  necessary  to  sufficiently  resolve  our  data  set.  We 
feel  confident  that  the  size  and  diversity  of  our  data  set 
has  allowed  us  to  describe  in  relative  depth  the  hap¬ 
lotype  architecture  of  ATM.  Consequently,  we  have  cho¬ 
sen  to  use,  as  the  foundation  for  further  studies,  the  22 
haplotypes  predicted  from  the  complete  data  set. 

With  a  minimal  amount  of  sequencing  (13.5  kb  in 
five  individuals),  we  were  able  to  detect  highly  infor¬ 
mative  neutral  sequence  variants  spanning  a  large  ge¬ 
nomic  region.  In  sequencing  10  chromosomes  form 
white  European  Americans,  we  found  SNPs  that  have 
a  common  occurrence  in  four  different  ethnic  groups. 
In  all  ethnic  groups,  the  majority  (11  of  14)  of  SNPs 
identified  in  this  study  have  a  minor-allele  frequency  of 
>25%.  SNPs  with  frequencies  in  the  range  of  .2-5  have 
the  highest  information  content  for  association  and  LD 
studies  (Kruglyak  1997).  Although  most  SNPs  had  a 
high  minor-allele  frequency  in  all  ethnic  groups,  allele 
frequencies  varied  across  ethnic  groups.  This  is  in  ac¬ 
cordance  with  several  other  studies  that  have  found 
population  differences  in  SNP-allele  frequencies  (Lai  et 
al.  1998;  Nickerson  et  al.  1998;  Cargill  et  al.  1999; 
Halushka  et  al.  1999;  Goddard  et  al.  2000).  Variations 
in  allele  frequencies  are  most  pronounced  in  the  Afri¬ 
can  American  population.  Four  SNPs  (IVS21  —  77t->c, 
rVS55+186c-*t,  rVS62+424g-*a,  and  IVS62-973a^c) 
have  a  minor-allele  frequency  that  is  reduced  by 
40%-75%  in  African  Americans,  compared  with  that 
in  other  ethnic  groups.  A  fifth  SNP,  IVS46— 257a->c, 
was  not  found  in  the  African  American  samples.  These 
differences  illustrate  that  there  is  population  structure 
in  SNP-allele  frequencies  that  is  an  important  factor  to 
consider  when  SNP-based  association  and  LD  studies 
are  designed. 


000 

Comparison  of  genotype  data  from  six  great  apes  was 
instructive  for  approximating  ancestral  haplotypes  and 
SNP  alleles.  Genotyping  revealed  three  haplotypes  in 
this  population,  none  of  which  is  identical  to  the  human 
ATM  haplotypes.  Of  the  14  SNPs,  2  showed  variation 
in  the  ape  population.  One  common  chimpanzee  was 
heterozygous  for  IVS62+424g->a,  and  all  three  goril¬ 
las  were  homozygous  for  a  third  allele  (A)  at  IVS8  — 
356t->c.  The  extent  of  homozygosity  in  this  sample  in¬ 
dicates  that  most  of  the  SNPs  found  varying  in  the  hu¬ 
man  population  have  arisen  since  the  divergence  of  the 
human  lineage  from  the  last  common  ancestor  shared 
with  the  chimpanzee.  This  agrees  with  the  assertion  by 
Hacia  et  al.  (1999) — that  is,  that  most  current  neutral 
human  polymorphisms  are  not  shared  with  the  chim¬ 
panzee  (Hacia  et  al.  1999).  It  may  also  imply  that  these 
SNPs  are  not  hypermutable  sites,  since  more  variation 
might  be  expected  in  the  12  primate  chromosomes  an¬ 
alyzed.  Although  these  SNPs  are  common  in  man,  they 
are  not  due  to  hypermutability;  rather,  they  are  old 
enough  to  be  found  throughout  diverse  ethnic  groups. 

The  results  of  this  study  show  a  remarkable  lack  of 
recombination  at  the  ATM  locus.  This  effect  is  most 
profound  in  the  white  European  American  population, 
in  which  no  evidence  for  recombination  is  detected  by 
the  four-gamete  test  and  in  which  D'  shows  perfect  dis¬ 
equilibrium  across  all  SNPs.  Low  recombination  is  im¬ 
plicated  for  the  African  American,  Asian  American,  and 
Hispanic  groups  as  well.  The  possibility  of  low  recom¬ 
bination  was  suspected  on  the  basis  of  the  seemingly 
small  number  of  haplotypes  found  at  this  locus.  Twenty- 
two  haplotypes  with  14  loci  is  not  considerably  greater 
than  the  n  +  1  (i.e.,  15)  that  would  be  expected  if  there 
is  no  recombination.  Another  study,  performed  in  par¬ 
allel  with  this  one,  used  the  same  approach  as  that  de¬ 
scribed  here  and  serves  as  a  direct  comparison:  D. 
Trikka,  Z.  Fang,  A.  Renwick,  S.  Jones,  R.  Chakraborty, 
M.  Kimmel,  and  D.  L.  Nelson  (unpublished  data)  used 
neutral  sequence  variants  dispersed  across  the  BLM, 
WRN,  and  RECQL  loci,  to  derive  haplotypes  for  these 
regions;  their  study  used  the  same  sample  population, 
with  fewer  SNPs  (8,  13,  and  11  respectively)  for  hap¬ 
lotype  construction,  and  found  considerably  larger 
numbers  of  haplotypes  (50,  56,  and  47,  respectively)  at 
each  locus.  The  key  difference  between  these  loci  and 
ATM  is  the  amount  of  recombination  and  LD  reported. 
Trikka  et  al.  found  more  evidence  for  recombination 
and  linkage  equilibrium  when  the  four-gamete  test  and 
Fisher’s  exact  test  were  used.  For  ATM ,  the  four-gamete 
test  revealed  few  site  pairs  with  four  gametes.  The  Hud- 
son-Kaplan  recombination  statistic  R  ranged  from  0,  in 
white  European  Americans,  to  4,  in  Hispanics.  Analysis 
by  both  Fisher’s  exact  test  and  D'  indicated  extensive 
LD  for  ATM,  in  all  ethnic  groups  studied.  Figure  5 
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shows  extensive  disequilibrium,  with  >72%  of  site  pairs 
having  perfect  disequilibrium  in  all  ethnic  groups. 

Using  a  model  association  study,  we  have  successful¬ 
ly  demonstrated  the  ability  of  ATM  haplotypes  to  iden¬ 
tify  chromosomes  carrying  specific  coding  polymor¬ 
phisms.  The  three  cSNPs  that  we  used  as  candidates  for 
detection  had  varying  frequencies  in  our  control  pop¬ 
ulation  of  white  European  Americans  (cSNPl,  .005; 
cSNP2,  .066;  and  cSNP3,  .015).  When  each  of  the  three 
cSNP  populations  was  analyzed  individually,  each  cSNP 
showed  a  significant  association  with  a  different  ATM 
haplotype.  cSNPl  showed  an  association  with  haplo- 
type  2,  cSNP2  with  haplotype  15,  and  cSNP3  with  hap¬ 
lotype  17  (P  =  .0166,  .0000,  and  .0000,  respectively); 
the  increase  in  haplotype  frequency  in  cases  versus  con¬ 
trols  was  2-fold,  8-fold,  and  6.5-fold,  respectively.  To 
model  the  potential  for  multiallelic  etiology  of  a  com¬ 
plex  disease,  we  combined  the  three  populations  of  sam¬ 
ples  carrying  the  cSNPs  into  one  “case”  population.  In 
this  analysis,  two  haplotypes  demonstrated  a  readily 
detectable  increase  in  frequency:  haplotype  15  showed 
a  fourfold  increase,  and  haplotype  17  showed  a  three¬ 
fold  increase  in  frequency;  no  frequency  increase  was 
apparent  for  haplotype  2,  which  had  previously  shown 
a  twofold  increase  in  the  cSNPl  population. 

The  association  that  becomes  undetectable  (i.e.,  hap¬ 
lotype  2  with  cSNPl)  involves  the  haplotype  occurring 
most  commonly  (frequency  .29)  in  the  general  popu¬ 
lation.  Haplotype  15  shows  the  greatest  increase  in  fre¬ 
quency  and  is  the  least  common  of  the  three  haplotypes, 
with  a  control  frequency  of  .05.  This  leads  us  to  an 
important  point  for  future  association  studies.  Haplo¬ 
types  with  lower  frequencies  in  control  populations  may 
be  more  effective  for  detection  of  associations.  However, 
it  is  important  to  note  that  haplotype  17,  which  is  the 
third  most  frequent  haplotype  (frequency  .10),  never¬ 
theless  showed  a  2.6-fold  increase  in  frequency  in  the 
combined  cSNP  population.  An  additional  factor  con¬ 
tributing  to  detection  in  this  study  is  frequency  of  the 
mutation.  In  the  case  of  cSNPl  and  haplotype  2,  in 
which  the  association  becomes  undetectable,  the  most 
frequent  haplotype  was  associated  with  the  least  com¬ 
mon  SNP  (cSNPl,  .006).  The  difference  in  frequency 
between  cSNPl  and  the  other  cSNPs  is  a  factor  of  10. 
Both  the  haplotype  frequency  and  the  cSNP  frequency 
contribute  to  detection.  This  underscores  the  idea  that 
several  factors,  including  frequency  of  haplotype,  fre¬ 
quency  of  mutation,  and  age  of  mutation,  contribute  to 
limits  of  detectability. 

This  model  association  study  demonstrates  proof  of 
principle  for  the  use  of  complex  SNP  haplotypes  cov¬ 
ering  candidate  genes,  in  the  detection  of  genetic  factors 
contributing  to  complex  disease.  We  have  successfully 
demonstrated  the  ability  of  these  ATM  haplotypes  to 
discern  members  of  our  “case”  population  that  carry  a 


particular  coding  SNP.  The  results  of  these  studies  in¬ 
dicate  that  haplotypes  extending  over  a  large  genomic 
region  have  a  significant  potential  for  detection  of  dis¬ 
ease  associations. 

There  is  much  interest  in  the  use  of  SNPs  in  ge¬ 
nomewide  association  studies  and  other  LD-based  strat¬ 
egies.  Our  approach  and  analyses  bear  on  those  strat¬ 
egies,  in  several  regards.  First,  LD  estimates  from 
simulation  studies  have  been  as  low  as  3  kb  of  mean¬ 
ingful  LD  (Kruglyak  1999).  This  calculation  suggests 
that  a  very-high-density  map  with  as  many  as  0.5-3 
million  SNPs  would  be  necessary  for  effective  associa¬ 
tion  studies  (Kruglyak  1999).  Our  results  and  those  of 
other  studies  (Collins  et  al.  1999;  Eaves  et  al.  2000; 
Moffatt  et  al.  2000;  Taillon-Miller  et  al.  2000)  indicate, 
to  the  contrary,  that  significant  LD  can  be  found  ex¬ 
tending  as  far  as  several  hundred  kilobases.  This  should 
reduce  the  number  of  SNPs  necessary  for  genomewide 
linkage  studies.  Comparison  of  LD  at  ATM  versus  the 
results  of  the  LPL  study  (Clark  et  al.  1998)  in  which 
LD  patterns  were  complex  over  just  9.7  kb  supports  the 
idea  that  LD  varies  widely  throughout  the  genome,  in¬ 
dicating  that  some  regions  will  require  SNPs  that  are 
more  densely  spaced.  Second,  higher-frequency  (.2-5) 
SNPs  are  more  robust,  whereas  rare  SNPs  may  be  less 
useful  and,  in  some  analyses,  may  confound  results. 
More  than  half  of  the  SNPs  used  to  construct  haplotypes 
in  the  LPL  study  had  a  relative  allele  frequency  of  <.2. 
This  resulted  in  67  of  71  individuals  having  a  unique 
haplotype.  By  using  fewer  markers  (14)  with  higher  fre¬ 
quency  (.20),  we  were  able  to  effectively  elucidate  the 
haplotype  architecture  and  the  LD  and  recombination 
profiles  for  the  ATM  genomic  locus  (142  kb).  These 
haplotypes  were  used  successfully  in  association  studies, 
to  detect  coding  polymorphisms  in  the  ATM  gene.  We 
conclude  that  reasonably  spaced,  highly  informative 
SNPs  have  the  ability  to  define  a  larger  number  of  an¬ 
cestral  chromosomes  and  have  increased  power  for  pop¬ 
ulation-based  association  studies. 
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Appendix  A 


Primers  Used  for  PCR  and  Sequencing 


fl  .atm,  ATGGTCATCTCGTTACAGGCAATGC 
rl.atm,  CCCCA AGTGACT GAAGGCAT CTAGG 
f2.atm,  TGGTGGAACCATTTCCGTTTAACG 
r2.atm,  GCGCCCTTCTAATAACCCGCC 
f3.atm,  GCCCAGAACCTCCGAATGACG 
r3.atm,  CGACTTAGCGTTTGCGGCTCG 
f4.atm,  TGGCrGGCAACATTACCAACTGC 
r4.atm,  TGCATCTTTTTCTGCCTGGAGGC 
f5.atm,  TGTGTGCTAGGGAGGAATCTGGTGG 
r5.atm,  GGCTGTCTCTAGGCTTGTrGAGGGC 
f6.atm,  CCATCATCCGAAAGGAGCCAAAAC 
r6.atm,  GCAGCAATTTCCCTGTTTCTGCC 
f7.atm,  AAATTGGCAGGATGATGAGGATGC 
r7.atm,  GCTGTCAAGCTGCATCAGCGTTAG 
f8.atm,  CCAAAGCGTGCCAGAATGGTATG 
r8.atm,  CCAAAGCGTGCCAGAATGGTATG 
f9.atm,  GGTATGCGTAGCGGGGCTAGTGAG 
r9.atm,  CGCAGGAAAAAGCCAGAT GCAAT C 
flO.atm,  GCCCTAGCCCCAGTGTATGTGGAG 
rlO.atm,  GGCAGCCAGTTTCCGAGAACTACC 
f  11. atm,  TT  1  1 1 GGCAAGGTGAGTATGTTGGC 
rll.atm,  TGCGAACTTGGTGATGATTGTCAGC 
fl2.atm,  AGATTGTTCCAGGACACGAAGGGAG 
rl2.atm,  TTTCTTCCCATTGTCACCTGTTCCC 
fl3.atm,  TGCGAAAAACAGGCTTTGTTTGC 
rl3.atm,  GGTGATGGAAAAGAGACGGGGC 
fl4.atm,  GCAAGTCCCCTCACCAGCAACAC 
rl4.atm,  GATGCCTTCCCATCATCCTGATACC 
fl5.atm,  TCTGGGAAGAAGTTACGCAGGGAAC 
rlS.atm,  CTGACTGGCACTAGAATTTGCTGGC 
f  16. atm,  GGCGG AAT GAAT GTG AGTTAT GCG 
rl6.atm,  CCAGGTGATTTCTCCATCCCGTG 
fl7.atm,  CT GCCTAAAG CAGCAG  11  1  1 1GCC 
rl7.atm,  TGTTGCTATCCCGAAGCTGAAACC 
fl8.atm,  GGTGTGTAAGCAAGAATGCCTGGG 


rl8.atm,  GCCACAGATTTTGAGACCACTGCAC 
fl9.atm,  TAGTTTGTATGGCTGTGGTGGAGGG 
rl9.atm,  CATCCCTCTGCTTCAGGAGTATCCC 
f20.atm,  CCAGTAGGGGGTCCCTCATTTCC 
r20.atm,  T  G  AGAAGCT  GGGAGT  G1TTCTGCC 
f21.atm,  CCCCGTACAT GAAGGGCAGTT G 
r21.atm,  T  GGGT  GGCT  GGGCTAATGAAGAG 
f22.atm,  GGTTCAGCGAGAGCTGGAGTTGG 
r22.atm,  GCAGCAGGGGGAAAACCCAC 
f23.atm,  CCACAGATTAGCAACAAGTTGGGGC 
r23.atm,  T GGCATA AGCACACGG AAACTCT CC 
f24.atm,  AGGTTCCGATGGCAAGGAGAGG 
r24.atm,  CTGTGTCTTTCCACCACTCCCCAG 
f25.atm,  CAGTCATGGTTCTGGGGAGAGAAGC 
r25.atm,  GCCTTTCTGATTTCCCTTCCTGCC 
£26.atm,  CTTGATGGTGGGAGGGACTTAGGG 
r26.atm,  TGCCTAGATGTTTGAGAGCCTGCC 
f27.atm,  CAGGGCACACAGGGTACAGTGTAGG 
r27.atm,  TCAGITCAGACCATCTCATGCCTCC 
f2  8 .  atm,  CAGGGGGAT GATAGT GAT GATGTGG 
r28.atm,  TTCAAAACATACATGCCCTGCCTTC 
f29.atm,  CAAAGACTGAGAGCTGAGCCCAGTG 
r29.atm,  GCACAATCTCCTCCTTTCTGCTGC 
f30.atm,  TGGTTTAGAAATGCCTTCAGCCCC 
r30.atm,  TGCACTCTACCTGCCATGCTTCC 
f31.atm,  GCCATGTCAGTGCCCAACTTGAAG 
r31.atm,  TTGGTGCTGCGTTrGGAATCTTG 
f32.atm,  GATTCCAAACGCAGCACCAAACC 
r32.atm,  GGTAGTTGATGGGGGAGGGGAAC 
f33.atm,  GTTCCCCTCCCCCATCAACTACC 
r33.atm,  GAGCACAGTGCCTTCTrCCACTCC 
f34.atm,  CCCT G ACAAT CT GGGGCACAAAC 
r34.atm,  CCGTGGCTTTTGCTGGCATTC 
f35.atm,  GT CCT GT GGCATT GT GCATAACT CC 
r35.atm,  GCAGACATTAGGCATAAGCCCCTTC 
£36. atm,  GATGACTGCCCTTGTTCCCCAAG 
r36.atm,  TGGTTAAGTTGCTTTTCCCCCCAG 


Appendix  B 


Primer  Sequences  and  Concentrations  Used  for  Multiplex  PCR 


Group  8: 

3F  ATM,  5'-GCCCAGAACCTCCGAATGACG-3';  and  3R-2  ATM,  5'-GCCGTGAAGCGAAAGAGGCG-3'  (0.25  /iM) 

11F  ATM, 5'-  ITITI GGC AAGGTG AGTATGTTGGC-3,;  and  11R  ATM,  5'-TGCGAACTTGGTGATGATTGTCAGC-3' (0.25  jtM) 
14F  ATM,  5'-GCAAGTCCCCTCACCAGCAACAC-3';  and  14R  ATM,  5'-GATGCCTTCCCATCATCCTGATACC-3'  (0.25  M M) 
23F-2  ATM,  5'-GGT GGAATCT GGT CTAGTTACCC-3';  and  23R  ATM,  5'-TGGCATAAGCACACGGAAACTCTCC-3'  (0.25  /tM) 
27F  ATM,  5'-CAGGGCACACAGGGTACAGTGTAGG-3';  and  27R  ATM,  5'-TCAGTTCAGACCATCTCATGCCTCC-3'  (0.188  /xM) 
29R  ATM,  5'-CAAAGACTGAGAGCTGAGCCCAGTG-3';  and  29R  ATM,  5'-GCACAATCTCCTCCTTTCTGCTGC-3'  (0.125  /*M) 
30F  ATM,  5'-TGGTTTAGAAATGCCTTCAGCCCC-3';  and  30R-2  ATM,  5'-CAGCCAGTCCAACATAAATCAG-3'  (0.25  /xM) 

31F  ATM,  5'-GCCAT GT CAGT GCCCAACTT GAAG-3';  and  31R  ATM,  5'-TT GGTGCTGCGTTT GGAAT CTT G-3'  (0.25  jtM) 
Group  7: 

7R  ATM,  5'-GCTGTCAAGCTGCATCAGCGTTAG-3';  and  7F-2  ATM,  5'-GTTGGATTACCATGTTCACCAG-3'  (.188  fxM) 

10F  ATM,  5'-GCCCTAGCCCCAGTGTATGTGGAG-3'and  10R-2  ATM,  5'-GCAGAGATAATCATGGGCAGG-3'  (0.25  /xM) 

15F  ATM,  5'-TCTGGGAAGAAGTTACGCAGGGAAC-3';  and  15R-2  ATM,  5'-TGGGGAGACTATGGTAAAAGAGG-3'  (0.31  *xM) 
16F  ATM,  5'-GGCGGAAT  GAAT  GT  GAGTTAT  GCG-3';  and  16R  ATM,  5'-CCAGGTGATTTCTCCATCCCGTG-3'  (0.25  ,uM) 

20F  ATM,  5'-CCAGTAGGGGGTCCCTCATTTCC-3';  and  20R  ATM,  5'-T GAG AAGCTGGGAGT GTTT CT GCC-3'  (0.25  iiM) 
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25F  ATM,  5,-CAGTCATGGTTCTGGGGAGAGAAGC-3,;  and  25R-2  ATM,  5'-CTATCAATATCTAGCTCTGGGGC-3'  {0.15  /xM) 
28F  ATM,  5'-CAGGGGGATGATAGTGATGATGTGG-3';  and  28R  ATM,  5TTTCAAAACATACATGCCCTGCCTTC-3'  (0.5  ixM) 


Appendix  C 


Probes  Used  for  ASO  Hybridization 


ATMAso  3T,  5'-TAACCCTCCTrCCCGC-3' 
ATMAso  3a,  5'-TAACCCTCCATCCCGC-3' 
ATMAso  7T,  5'-AAGGAACTT GTAATATTTTT C-3' 
ATMAso  7c,  5'- AGG AACT CGTAATATTTTT C-3' 
ATMAso  10T,  5'-TGGGAAACATGACCAGGG-3' 
ATMAso  10c,  5'-GGGAAACACGACCAGGG-3' 
ATMAso  11T,  y-GTAACTTATAATAACCTTTC-T 
ATMAso  11c,  5'-GAAGTAACTTACAATAACC-3' 
ATMAso  14C,  5-TCTGTACAAGAAAAATTTG  -3' 
ATMAso  14g,  5'-T CT GTAG AAG AAAAATTT G-3' 
ATMAso  15C,  5-TTT CCTCT C AGT CTAC A  GG  -3' 
ATMAso  15t,  5-T TTTT1  CCT CTTAGT CTAC  AGG-3' 
ATMAso  16C,  5/-TAGAGATGATGTCGGCTrC-3/ 
ATMAso  16t,  5/-CTAGAGATGATGTTGGCTTC-3/ 
ATMAso  20A,  5'-GTAATGTCAGAGTATTAAA-3' 
ATMAso  20c,  5'-TAAT GT CAGCGTATTAAA-3' 
ATMAso  23T,  ^CAAAAGCTTCTCTTGCTTr-T 
ATMAso  23c,  5'-AAAAGCTT  CT  CCT  GCTTTC-3' 
ATMAso  25C,  5'-TITITTGTGGCATTCACAC-3' 
ATMAso  25t,  5'-TTTTTT GT GGTATTCACAC-3' 
ATMAso  27C,  5'-CTGCTCATGCCTCCTCTC-3' 
ATMAso  27t,  5'-CTGCTCATGTCTCCTCTCC-3' 
ATMAso  28C,  5'-TT CTATTAAACAGTATTA-3' 
ATMAso  28a,  5'-TTCTATTAAAAAGTATTA-3' 
ATMAso  29. IT,  5'- G ATAAAG ATAT GTT G  AC  AA-3' 
ATMAso  29.1c,  5'-GATAAAGATACGTTGACAA-3' 
ATMAso  29 .2C,  5/-ACTTCCTGACGAGATACAC-3/ 
ATMAso  29 .2t,  5'-ACTTCCTGATGAGATACAC-3' 
ATMAso  30c,  5'-CCTAAGCCACGTTCCTCTA-3' 
ATMAso  30t,  5/-CCTAAGCCATGTTCCTCTA-3/ 
ATMAso  31. 1C,  5'-AAATAGAGCGATTTT GGTT-3' 
ATMAso  31.lt,  5'-AAATAGAGAGATTTTGGTTC-3' 
ATMAso  31.2C,  5'-AG  AAATTCCT CAT G AACT C-3' 
ATMAso  31.2a,  5'- AG  AAATT  CAT  CAT  G  AACT  C-3 ' 
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Abstract 

Background/Purpose:  In  in  vivo  models,  radiation-induced  genomic  instability 
correlates  with  the  risk  of  breast  cancer  development.  In  addition,  homozygous 
mutations  in  tumor  suppressor  genes  associated  with  breast  cancer  development 
adversely  affects  the  processing  and  repair  of  radiation-induced  DNA  damage.  We 
performed  a  case-control  study  to  determine  whether  an  assay  measuring  radiation- 
induced  chromatid  breaks  correlated  with  the  risk  of  having  bilateral  breast  cancer. 

Materials  &  Methods:  Patients  were  prospectively  studied  on  an  institutional  review 
board-approved  protocol.  We  included  only  women  with  bilateral  breast  cancer  as  cases 
in  order  to  obtain  patients  with  a  presumed  genetic  susceptibility  for  breast  cancer. 
Controls  were  healthy  women  without  a  previous  cancer  history.  A  mutagen  sensitivity 
assay  using  y-radiation  was  performed  on  lymphocytes  obtained  from  26  cases  and  1 8 
controls.  One  milliliter  of  whole  blood  was  cultured  with  9  ml  of  blood  medium  for  67  h 
and  then  treated  with  125  cGy  using  a  Cs-137  irradiator.  Following  an  additional  4  h  in 
culture,  cells  were  treated  with  Colcemid  for  1  h  to  arrest  cells  in  metaphase.  The 
number  of  chromatid  breaks  per  cell  was  counted  using  a  minimum  of  50  metaphase 
spreads  for  each  sample. 

Results:  Cases  had  a  statistically  higher  number  of  y-radiation-induced  chromatid  breaks 
per  cell  than  controls,  with  mean  values  of  0.61  +/-  0.24  versus  0.45  +/-  0.14,  respectively 
(p=0.034,  Wilcoxon  rank  sum  test).  Using  the  75th  percentile  value  in  the  control  group 
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as  a  definition  of  radiation  sensitivity,  the  radiation-sensitive  individuals  had  a  2.83-fold 
increased  odds  ratio  for  breast  cancer  development  compared  with  individuals  who  were 
not  radiation  sensitive  (95%  confidence  intervals  of  0.83,  9.67). 

Conclusions:  These  preliminary  data  suggest  that  sensitivity  to  radiation-induced 
chromatid  breaks  in  lymphocytes  correlates  with  the  risk  of  bilateral  breast  cancer. 
Although  the  differences  between  cases  and  controls  were  statistically  significant,  the 
small  sample  size  necessitates  that  this  finding  be  validated  in  a  larger  study.  More  data 
are  also  needed  to  determine  whether  this  sensitivity  is  limited  to  breast  cancer  patients 
with  a  genetic  susceptibility  for  the  disease  or  also  applies  to  the  general  breast  cancer 
population. 

Key  Words:  breast  cancer,  radiosensitivity,  chromatid  breaks,  radiation 
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Introduction 

A  biological  predictive  assay  of  breast  cancer  development  risk  would  have 
significant  relevance  to  a  large  cohort  of  women.  Breast  cancer  is  the  most  common 
nondermatological  cancer  in  women,  with  an  estimated  182,800  new  cases  diagnosed 
annually  in  the  United  States  (1).  Furthermore,  breast  cancer  remains  the  second  leading 
cause  of  cancer  deaths  in  women,  with  40,800  predicted  to  die  of  the  disease  in  the  year 
2000  (1).  A  biological  assay  quantifying  an  individual’s  risk  of  breast  cancer 
development  could  help  identify  candidates  for  judicious  clinical  and  radiographic 
screening,  for  trials  evaluating  chemoprevention  strategies,  and  for  consideration  of 
prophylactic  surgical  interventions. 

There  are  two  types  of  biological  predictors  of  breast  cancer  risk:  genotype 
sequencing  and  phenotype  screening.  The  discovery  and  cloning  of  BRCA1  and  BRCA2 
have  permitted  the  development  of  a  commercially  available  sequencing  test  to  identify 
germline  mutations  in  these  two  genes.  In  addition,  the  relationships  of  mutations  in 
other  candidate  genes,  such  as  ATM  (ataxia  telangiectasia,  mutated),  to  breast  cancer 
development  are  being  investigated  by  a  number  of  groups.  While  genotype  sequencing 
has  had  a  dramatic  impact  on  understanding  breast  cancer  risk  in  selected  cases,  only  a 
small  percentage  of  breast  cancer  patients  develop  the  disease  in  the  setting  of  a  known 
predisposing  germline  mutation  (2,3). 

In  this  preliminary  report,  we  investigate  a  phenotype-screening  assay  for  breast 
cancer.  Phenotype  screening  has  a  number  of  advantages  and  disadvantages  compared 
with  genotype  sequencing.  By  evaluating  a  common  downstream  consequence  of  a 
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variety  of  tumor  suppressor  gene  mutations,  a  phenotype  assay  can  potentially  capture  a 
much  broader  percentage  of  the  breast  cancer  population.  Furthermore,  this  strategy  is 
not  dependent  on  new  gene  discovery  and  potentially  can  identify  individuals  who  harbor 
relevant  germline  mutations  in  yet  undiscovered  genes.  A  phenotype-screening  assay 
also  affords  the  possibility  of  quantifying  the  importance  of  an  individual’s  genotype. 

For  example,  a  phenotype-screening  assay  may  be  able  to  quantitatively  distinguish 
between  different  mutations  in  a  tumor  suppressor  gene  that  entail  different  risks  of 
breast  cancer  development. 

The  assay  we  investigated  in  this  study  was  cellular  radiosensitivity,  as  defined  by 
the  number  of  chromatid  breaks  per  cell  following  in  vitro  treatment  of  lymphocytes  with 
y-radiation. 

Materials  and  Methods 

Approval  for  this  prospective  study  was  obtained  through  The  University  of 
Texas  M.  D.  Anderson  Institutional  Review  Board.  Informed  consent  was  obtained  from 
all  cases  and  controls  in  this  study. 

Cases  were  26  women  with  a  history  of  bilateral  breast  cancer.  All  women  had  a 
least  one  breast  cancer  treated  in  our  institution.  No  samples  were  obtained  from  the 
cases  during  chemotherapy  or  radiation  treatment  because  these  treatments  could 
potentially  affect  the  number  of  chromatid  breaks.  Controls  were  18  women  with  no 
personal  cancer  history  who  were  recruited  for  a  simultaneous  study  investigating  lung 


cancer. 
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All  participants  donated  10  -  20  ml  of  blood  for  the  mutagen  sensitivity  assay. 
The  details  of  the  mutagen  sensitivity  assay  have  been  previously  described  (4),  although 
in  this  study  y-radiation  rather  than  bleomycin  was  used  a  mutagenic  agent.  All  cultures 
were  set  up  within  24  h  of  the  blood  draw.  One  milliliter  of  blood  was  added  to  9  ml  of 
RPMI-1640  medium  supplemented  with  20%  fetal  calf  serum  and  phytohemagglutinin. 
The  cultures  were  then  incubated  at  37°C  for  72  h,  after  which  the  cultures  were  treated 
with  125  cGy  of  y-radiation  delivered  from  a  Cs-137  irradiator.  The  cultures  were  then 
incubated  for  4  h  to  allow  time  for  DNA  repair.  Subsequently,  the  cultures  were  treated 
for  1  h  with  Colcemid  (0.04  pg/ml)  to  arrest  cells  in  metaphase.  Cells  were  then 
harvested,  fixed,  washed,  and  stained  with  Giemsa  as  previously  reported  (4).  For  each 
case  and  control,  the  number  of  chromatid  breaks  per  cell  were  counted.  A  minimum  of 
50  metaphase  spreads  per  sample  were  examined. 

The  mean  values,  standard  deviations,  and  standard  errors  were  calculated.  A 
Wilcoxon  rank  sum  test  for  non-normal  distribution  was  used  to  compare  cases  and 
controls.  This  test  analyzed  the  data  as  categorical  variables  to  minimize  the  impact  that 
a  single  high  mutagen  sensitivity  score  could  have  on  the  mean  value.  Odds  ratios  for 
bilateral  breast  cancer  were  determined  by  comparing  the  incidence  of  bilateral  breast 
cancer  in  mutagen-sensitive  individuals  and  mutagen-resistant  individuals.  Consistent 
with  previous  reports,  the  value  for  being  categorized  as  mutagen  sensitive  was  the  75% 
value  of  the  control  population.  This  value  was  determined  prior  to  the  analysis  of  the 
data  and  represents  an  accepted  quartile  cutoff  point. 
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Results 

Characteristics  of  Cases 

The  median  age  of  the  cases  at  the  time  of  first  breast  cancer  diagnosis  was  49 
years  with  a  range  of  25  -  79.  Sixty-five  percent  of  the  cases  had  a  history  of  breast 
cancer  in  either  a  primary  relative  (38%)  or  a  secondary  relative  (27%),  31%  of  the  cases 
denied  a  breast  cancer  family  history,  and  in  1  case  the  family  history  was  unknown.  The 
majority  of  cases  (88%)  were  Caucasian.  Only  two  of  the  cases  were  of  Ashkenazi 
Jewish  decent,  and  1  of  these  was  known  to  have  a  germline  mutation  in  BRCA1 .  From 
a  published  nomogram  for  predicting  the  probability  of  having  a  BRCA1  mutation  based 
on  personal  cancer  history,  family  cancer  history,  age  of  diagnosis,  and  whether  the 
individual  is  of  Ashkenazi  descent  (5),  the  approximate  average  probability  of  having  a 
BRCA1  mutation  for  our  cases  was  15%. 

Mutagen  Sensitivity  Assay  Results '' 

The  number  of  chromatid  breaks  per  cell  was  significantly  higher  in  our  cases 
versus  controls,  with  respective  values  of  0.61  +/-  0.24  (standard  deviation)  and  0.45  +/- 
0.14  (p=0.034).  Figure  1  shows  the  distribution  of  cases  and  controls  according  to  the 
number  of  chromatid  breaks  per  cell.  As  shown,  the  distribution  of  the  cases  is  skewed  to 
the  radiosensitive  end  of  the  graph. 

The  data  were  also  analyzed  to  determine  the  odds  ratio  for  breast  cancer 
development  for  mutagen-sensitive  and  mutagen-resistant  individuals.  Consistent  with 
previous  studies  using  the  mutagen  sensitivity  assay,  we  dichotomized  cases  and  controls 
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as  being  sensitive  or  resistant  at  the  75%  level  of  the  controls  (0.56  chromatid  breaks  per 
cell).  This  analysis  revealed  that  the  mutagen-sensitive  individuals  had  an  odds  ratio  for 
breast  cancer  development  of  2.83  (95%  confidence  interval  of  0.83  -  9.67). 

A  comparison  was  also  performed  between  the  cases  with  a  positive  (n=17)  or 
negative  (n=8)  family  history.  These  results  revealed  that  the  cases  with  a  positive 
family  history  had  a  higher  number  of  chromatid  breaks  per  cell  than  those  with  a 
negative  family  history,  although  the  difference  between  the  two  groups  was  not 
statistically  significant  (0.67  +/-  0.14  versus  0.49  +/-  0.25,  p=0.07).  The  distribution  of 
these  results  is  shown  in  Figure  2. 


Discussion 

In  this  paper,  we  present  evidence  that  the  phenotype  of  cellular  radiosensitivity, 
as  defined  by  a  chromatid-break  assay,  correlates  with  the  risk  of  having  bilateral  breast 
cancer.  Specifically,  we  found  that  radiation  induced  a  greater  number  of  chromatid 
breaks  in  lymphocytes  from  patients  with  a  history  of  bilateral  breast  cancer  compared  to 
female  controls  without  a  cancer  history. 

This  study  followed  an  earlier  negative  report  from  our  institution  investigating 
the  value  of  the  mutagen  sensitivity  assay  in  predicting  breast  cancer  risk.  In  1989,  Hsu 
et  al.  reported  no  increase  in  the  number  of  chromatid  breaks  per  cell  in  82  breast  cancer 
cases  compared  to  335  controls  (0.64  +/-  0.36  versus  0.60  +/-0.35,  respectively)  (4).  We 
designed  this  current  protocol  with  important  differences  from  the  earlier  Hsu  et  al.  study. 
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First,  by  evaluating  only  patients  with  a  personal  history  of  bilateral  breast  cancer  (2/3  of 
whom  also  had  a  positive  family  history  of  breast  cancer)  we  selected  cases  that  had  a 
greater  probability  of  having  a  predisposing  genotype.  In  the  original  Hsu  study,  patients 
with  a  history  of  a  single  breast  cancer  were  selected  without  regard  to  age  at  diagnosis  or 
family  history  status.  A  second  important  difference  between  the  two  studies  was  our  use 
of  y-radiation  as  a  mutagen  compared  to  the  bleomycin  that  was  used  in  the  earlier  study. 

The  rationale  for  reinvestigating  the  mutagen  sensitivity  assay  in  our  breast  cancer 
study  population  is  as  follows.  While  the  majority  of  breast  cancers  are  believed  to 
develop  independently  of  an  individual’s  genotype,  it  is  clear  that  family  history  of  breast 
cancer,  particularly  in  a  premenopausal  first-degree  relative,  is  an  important  risk  factor 
for  the  development  of  this  disease.  This  increased  risk  is  likely  due  to  inheritance  of  a 
predisposing  genotype.  The  specific  genes  contributing  to  this  predisposition  are 
unknown  in  most  women  with  breast  cancer  and  a  positive  family  history.  Less  than  7% 
of  all  breast  cancers  are  thought  to  occur  in  the  setting  of  a  germline  mutation  in  BRCA1 
or  BRCA2  (2,3).  A  phenotype  assay,  such  as  the  one  described  in  this  report,  is  not 
dependent  on  the  discovery  of  these  unknown  genetic  conditions.  An  assay  that  can 
capture  a  common  downstream  functional  effect  of  a  variety  of  predisposing  mutations 
would  be  relevant  to  a  much  broader  population  of  women  than  a  genotype  sequencing 
approach. 

A  possible  shortcoming  of  using  radiation-induced  chromatid-breaks  as  a 
predictor  for  breast  cancer  development  is  that  this  phenotype  may  not  be  a  consistent 
consequence  of  all  predisposing  genetic  conditions.  For  example,  there  is  no  evidence 
that  individuals  with  Li-Fraumeni  syndrome  (a  germline  mutation  in  p53)  have  increased 
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susceptibility  to  chromatid-breaks.  However,  mutations  in  BRCA1,  BRCA2,  and  ATM 
all  affect  cellular  radiosensitivity  and  the  success  of  double-strand  break  repair  following 
ionizing  radiation.  Specifically,  both  BRCA1  and  Brca2  colocalize  with  Rad51 
following  radiation-induced  double-strand  injuries  (6,7).  Additionally,  normal  function 
of  BRCA1  is  required  for  transcription-coupled  repair  following  damage  from  ionizing 
radiation  (8).  Finally,  BRCA1  has  also  been  shown  to  associate  with  hRad50-hMrel  1- 
p95  in  directing  a  cellular  DNA  damage  response  following  ionizing  radiation  (9).  A 
third  tumor  suppressor  gene  that  may  have  relevance  to  breast  cancer  formation,  ATM, 
also  plays  a  critical  role  in  the  successful  repair  of  DNA  strand  breaks  following  radiation 
(10).  This  role  may  in  part  be  explained  by  the  finding  that  BRCA1  protein  function  is 
dependent  on  phosphorylation  by  the  ATM  protein  (11).  It  is  clear  that  homozygous 
mutations  in  any  of  these  three  genes  (BRCA1,  BRCA2,  or  ATM)  result  in  a 
radiosensitive  phenotype  (7-12). 

The  second  rationale  for  using  radiation  as  a  mutagen  for  our  experiment  is  that 
ionizing  radiation  is  the  most  clearly  recognized  environmental  carcinogen  for  breast 
cancer.  The  first  evidence  of  the  carcinogenic  effect  of  radiation  came  from  longitudinal 
studies  of  Japanese  atomic  bomb  survivors  (13).  The  importance  of  radiation  as  a  breast 
carcinogen  was  further  confirmed  by  the  findings  of  increased  breast  cancer  rates  in 
women  treated  with  radiation  for  nonmalignant  conditions  such  as  tuberculosis  and 
enlargement  of  the  thymus  (14,15).  The  use  of  radiation  as  a  cancer  treatment  also  has 
been  shown  to  carry  carcinogenic  risks.  In  a  large  study  of  girls  treated  with  mantle 
irradiation  for  Hodgkin’s  disease,  the  30-year  actuarial  risk  of  developing  breast  cancer 
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approached  35%  (16).  Furthermore,  there  appeared  to  be  a  dose-response  relationship 
between  radiation  exposure  and  breast  cancer  development. 

Our  finding  that  radiation-induced  chromatid  breaks  correlated  with  the  risk  of 
having  bilateral  breast  cancer  is  in  part  supported  by  animal  studies.  Ponnaiya  et  al. 
noted  that  the  significantly  higher  rates  in  radiation-induced  mammary  carcinoma  in 
B ALB/c  mice  compared  to  C57BL/6  mice  correlated  with  differences  in  radiation- 
induced  genomic  instability  in  mammary  epithelial  tissue  (17).  After  16  population 
doublings,  irradiated  mammary  cells  from  BALB/c  mice  had  significantly  more 
chromatid  breaks  than  C57BL/6  mice.  These  data  suggested  that  a  genotype  that 
increases  breast  cancer  susceptibility  correlated  with  a  phenotype  of  sensitivity  to 
radiation-induced  chromatid  breaks. 

Following  Hsu  et  al.’s  initial  investigation  of  chromatid  breaks  in  breast  cancer 
patients,  a  number  of  other  investigators  have  also  evaluated  whether  a  chromatid  break 
assay  could  predict  the  risk  of  breast  cancer  development.  Three  series  with  relatively 
small  number  of  breast  cancer  cases  all  showed  an  increase  in  the  median  number  of 
induced  lymphocyte  chromatid  breaks  in  cases  versus  controls  (18-20).  In  the  largest 
series  to  date,  Scott  et  al.  found  a  statistically  significant  increase  in  chromatid  breaks  per 
cell  in  135  women  with  a  single  breast  cancer  compared  to  105  controls  with  no  breast 
cancer  history  (21).  Together  these  data,  along  with  our  current  study,  suggest  the 
phenotype  of  sensitivity  to  radiation-induced  chromatid  breaks  correlates  with  the  risk  of 
breast  cancer  development.  However,  as  shown  in  Figure  1,  there  is  a  considerable 
degree  of  overlap  in  the  assay  results  between  cases  and  controls.  This  suggests  that  the 
assay  is  unlikely  to  develop  into  a  test  with  high-sensitivity  and  high-specificity.  None- 
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the-less,  the  test  may  be  of  clinical  value  for  an  individual  found  to  have  a  high  number 
of  chromatid  breaks.  In  our  study,  a  value  of  0.65  or  greater  captured  40%  of  the  cases 
compared  with  only  5%  of  the  controls. 

Two  studies  that  have  investigated  radiation-induced  chromatid  breaks  in  first- 
degree  relatives  have  provided  further  evidence  that  the  radiosensitivity  noted  in  breast 
cancer  cases  is  genetically  based.  Patel  et  al.  reported  that  first-degree  relatives  of  breast 
cancer  patients  had  more  radiation-induced  chromosome  breaks  compared  to  controls 
(20).  Additionally,  Roberts  et  al.  recently  reported  that  62%  of  first-degree  relatives  of 
16  radiosensitive  breast  cancer  patients  from  the  Scott  et  al.  study  (21)  were  also 
radiosensitive  (22).  This  compared  to  a  rate  of  only  7%  in  first-degree  relatives  of  4 
breast  cancer  patients  with  a  low  number  of  chromatid  breaks  per  cell  (22).  Furthermore, 
Roberts  et  al.  modeled  the  inheritance  pattern  of  radiosensitivity  and  breast  cancer  and 
suggested  that  the  data  fit  with  a  marker  of  an  inherited  low-penetrance  breast  cancer 
predisposition  gene(s). 

A  potential  shortcoming  of  the  lymphocyte  assay  that  we  used  in  this  study  is  that 
lymphocyte  response  to  radiation  is  likely  dependent  on  a  number  of  factors.  For 
example,  it  is  possible  that  cytokines,  released  either  from  cancer  cells  or  in  response  to 
having  cancer,  can  affect  lymphocyte  response.  To  more  precisely  distinguish  the  genetic 
and  epigenetic  influences  on  lymphocyte  chromatid  breaks,  more  data  comparing  the 
rates  of  the  mutagen  sensitivity  assay  in  individuals  with  known  predisposing  genotypes, 
individuals  with  single  breast  cancers  and  no  family  history  and  individuals  without  a 
cancer  history  will  be  needed. 
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In  conclusion,  increasing  data  suggest  that  screening  for  the  phenotype  of 
radiation-induced  chromatid  breaks  may  prove  useful  as  a  biological  predictor  for  breast 
cancer  risk.  We  believe  that  the  preliminary  data  in  this  report  needs  additional 
confirmatory  data,  as  the  aggregate  data  of  our  study  and  those  reported  in  the  literature  is 
relatively  small  and  is  subject  to  publication  bias  (negative  studies  of  this  type  are 
unlikely  to  be  reported). 
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Figure  Legends 


Figure  1,  Distribution  of  the  case  and  control  populations  as  a  function  of  the  mutagen 
sensitivity  assay  results. 


Figure  2,  Distribution  of  the  bilateral  breast  cancer  patients  with  a  positive  (+  FH)  or 
negative  (-  FH)  breast  cancer  family  history  as  a  function  of  the  mutagen  sensitivity  assay 


results. 
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